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PREFACE TO THE SECOND EDITION 



With the first edition out of print, we decided to arrange for republi- 
cation of Denumerable Markov Chains with additional bibliographic 
material. The new edition contains a section Additional Notes that 
indicates some of the developments in Markov chain theory over the 
last ten years. As in the first edition and for the same reasons, we have 
resisted the temptation to follow the theory in directions that deal with 
uncountable state spaces or continuous time. A section entitled 
Additional References complements the Additional Notes. 

J. W. Pitman pointed out an error in Theorem 9-53 of the first 
edition, which we have corrected. More detail about the correction 
appears in the Additional Notes. Aside from this change, we have left 
intact the text of the first eleven chapters. 

The second edition contains a twelfth chapter, written by David 
GrifFeath, on Markov random fields. We are grateful to Ted Cox for 
his help in preparing this material. Notes for the chapter appear in the 
section Additional Notes. 

J.G.K., J.L.S., A.W.K. 

March, 1976 
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PREFACE TO THE FIRST EDITION 



Our purpose in writing this monograph has been to provide a syste- 
matic treatment of denumerable Markov chains, covering both the 
foundations of the subject and some topics in potential theory and 
boundary theory. Much of the material included is now available only 
in recent research papers. Tlie book’s theme is a discussion of relations 
among what might be called the descriptive quantities associated with 
Markov chains — probabilities of events and means of random variables 
that give insight into the behavior of the chains. 

We make no pretense of being complete. Indeed, we have omitted 
many results which we feel are not directly related to the main theme, 
especially when they are available in easily accessible sources. Thus, 
for example, we have only touched on independent trials processes, 
sums of independent random variables, and limit theorems. On the other 
hand, we have made an attempt to see that the book is self-contained, 
in order that a mathematician can read it without continually referring 
to outside sources. It may therefore prove useful in graduate seminars. 

Denumerable Markov chains are in a peculiar position in that the 
methods of functional analysis which are used in handling more general 
chains apply only to a relatively small class of denumerable chains. In- 
stead, another approach has been necessary, and we have chosen to use 
infinite matrices. They simplify the notation, shorten statements and 
proofs of theorems, and often suggest new results. They also enable one 
to exploit the duality between measures and functions to the fullest. 

The monograph divides naturally into four parts, the first three con- 
sisting of three chapters each and the fourth containing the last two 
chapters. 

Part I provides background material for the theory of Markov chains. 
It is included to help make the book self-contained and should facilitate 
the use of the book in advanced seminars. Part II contains basic results 
on denumerable Markov chains, and Part III deals with discrete poten- 
tial theory. Part IV treats boundary theory for both transient and re- 
current chains. The analytical prerequisites for the two chapters in this 
last part exceed those for the earlier parts of the book and are not all 
included in Part I. Primarily, Part IV presumes that the reader is 
familiar with the topology and measure theory of compact metric 
spaces, in addition to the contents of Part I. 
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Two chapters — Chapters 1 and 7 — require special comments. Chap- 
ter 1 contains prerequisites from the theory of infinite matrices and 
some other topics in analysis. In it Sections 1 and 5 are the most impor- 
tant for an understanding of the later chapters. Chapter 7, entitled 
‘Introduction to Potential Theory,” is a chapter of motivation and should 
be read as such. Its intent is to point out why classical potential theory 
and Markov chains should be at all related. 

The book contains 239 problems, some at the end of each chapter 
except Chapters 1 and 7. 

For the most part, historical references do not appear in the text 
but are collected in one segment at the end of the book. 

Some remarks about notation may be helpful. We use sparingly 
the word “Theorem” to indicate the most significant results of the 
monograph; other results are labeled “Lemma,” “Proposition,” and 
“Corollary” in accordance with common usage. The end of each proof 
is indicated by a blank line. Several examples of Markov chains are 
worked out in detail and recur at intervals; although there is normally 
little interdependence between distinct examples, different instances of 
the same example may be expected to build on one another. 

A complete list of symbols used in the book appears in a list separate 
from the index. 

We wish to thank Susan Knapp for typing and proof-reading the 
manuscript. 

We are doubly indebted to the National Science Foundation: First, 
a number of original results and simplified proofs of known results were 
developed as part of a research project supported by the Foundation. 
And second, we are grateful for the support provided toward the 
preparation of this manuscript. 

J. G. K. 

J. L. S. 

A. W. K. 

Dartmouth College 

Massachusetts Institute of Technology 
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CHAPTER 1 



PREREQUISITES FROM ANALYSIS 



1. Denumerable matrices 

The word denumerable in the sequel means finite or countably 
infinite. Let M and N be two non-empty denumerable sets. A 
matrix is a function with domain the set of ordered pairs (m, n), where 
me M and ne N, and with range a subset of the extended real number 
system — the reals with -f oo and — oo adjoined. We call the sets M 
and N index sets. The matrix is called a finite matrix if both M and N 
are finite sets. 

To say that the m-nth. entry of the matrix is x or is equal to x, we 
mean that the value of the function on the pair {m, n) is x. A matrix 
is said to be non-negative if all of its entries are non-negative, and it is 
said to be positive if all of its entries are positive. We agree to use 
upper-case italic letters to stand for matrices. If A is a matrix, we 
denote the m-nth entry of A by Some examples of matrices are 

as follows : 

(1) If all entries of a matrix are equal to zero, we say that the matrix 
is the zero matrix, denoted by 0. 

(2) A matrix for which M and N are the same set is called a square 
matrix. The entries corresponding tom = n are diagonal entries; other 
entries are ofF-diagonal entries. 

(3) A square matrix whose off-diagonal entries all equal zero is a 
diagonal matrix. The diagonal matrix obtained from a square matrix 
A by setting all of its ofF-diagonal entries equal to zero is denoted A^g. 

(4) A diagonal matrix whose diagonal entries are all equal to one is 
called the identity matrix, denoted by 7. 

(5) A matrix whose second index set contains only one element is 
called a column vector. If we wish to distinguish a column vector from 
an arbitrary matrix, we shall denote the former by a lower-case italic 
letter. 
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(6) A matrix whose first index set contains only one element is called 
a row vector. If we wish to distinguish a row vector from an arbitrary 
matrix, we shall denote the former by a lower-case Greek letter. 

(7) If A is a matrix defined on index sets M and N, define a matrix 

A'^, called the transpose of to have index sets N and M and to have 
entries given by The transpose of the transpose of A is 

simply A. 

(8) The column vector all of whose entries are equal to one is denoted 
1 ; the row vector with all entries one is 1 A matrix other than a row 
or column vector which has all entries equal to one is denoted by E, 

(9) If A is an arbitrary matrix and c is a real number, cA is the 
matrix whose entries are given by (cA)^^ = ^^mn- 

(10) The matrix — ^ is defined to be the matrix ( — 1)^. 

(11) A constant (column) vector is a vector of the form c1 for some 
extended real number c. 

(12) A bounded vector is a vector all of whose entries are less than or 
equal in absolute value to some finite real number c. 

Two matrices A and B are equal, written A — B, they have the 
same index sets and if = ^mn every m and n. Inequalities are 
defined similarly. For example, A > B ii A and B have the same 
index sets and if Aj^^ > B^^ for every m and n. In particular, non- 
negative matrices are those for which A > 0, and positive matrices are 
those for which ^ > 0. 

Addition of matrices is defined for matrices A and B having the same 
index sets M and N. Their sum C = A + B has the same index sets, 
and addition is defined entry-by-entry: 

^mn ~ ^mn ^mn' 

The sum C = A + is well defined if no entry of C is given by oo — go 
or by — GO -h go. We leave the verification of the following properties 
of matrices with index sets M and N to the reader : 

(1) ^ + 0 = ^ for every A. 

(2) For every A having all entries finite, A + ( — A) = 0. 

(3) For any matrices A, B, and C, 

A -h (B + C) = (A + B) + C 

if the indicated sums on at least one side of the equality are 
well defined. 

Up to now, we have imposed no orderings on our index sets, and in 
fact nothing we have done so far necessitates doing so. We shall define 
even matrix multiplication shortly in a way that requires no ordering. 
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There is, however, a standard way of representing matrices as rec- 
tangular arrays, and for this purpose one normally orders the index 
sets with the usual ordering on the non-negative integers. The 
elements of the index sets are thus numbered 0, 1, 2, . . . either up to 
some integer r if the index set is finite or indefinitely if the index set is 
infinite. Under such orderings of its index sets, a matrix A is repre- 
sented as 




We note that other representations are possible if at least one of the 
index sets is infinite; such representations come from ordering the 
index sets with an order type other than that of the non-negative 
integers. We shall meet another order type with its corresponding 
representation at the end of this section. We point out, however, that 
orderings are completely irrelevant as far as the fundamental properties 
of matrices are concerned, and we shall have little occasion to refer to 
them again. 

For any real number a^, define and a^~ by 

= max (a^, 0) 

«m" = -min(a^, 0). 

The sum of denumerably many non-negative terms 2meM 
2meM ^m~ ^Iways exists independently of any ordering on M. There- 
fore, we say that JmeM = 2meM - 2meM «m" IS Well defined if 
not both of 2meM ^nd 2m€M «m” ^re infinite. 

Definition 1-1: Let ^ be a matrix with index sets K and M, and let 
5 be a matrix with index sets M and N. Suppose the sums 

V j 

meM 

are well defined for every k and every n. Then the matrix product 
C = AB is said to be well defined; its index sets are K and N, and its 
entries are given by (7^^ = 2mGM ^km^mn- Matrix multiplication is not 
defined unless all of these properties hold. 

Most of the propositions and theorems about matrices that we shall 
deal with are statements of equality of matrices A = B. Such state- 
ments are really just assertions about the equality of corresponding 
entries of A and B, and a proof that A equals B need only contain an 
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argument that an arbitrary entry of A equals the corresponding entry 
of B. With this understanding, we see that the proof of the additive 
properties of matrices is reduced to a trivial repetition of the properties 
of real numbers. Propositions about multiplication, however, when 
looked at entry-by-entry involve a new idea. 

Let .4 be a matrix with index sets M and N and let m and n be fixed 
elements of M and N, respectively. The mth row of A is defined to be 
the restriction of the function A to the domain of pairs (m, s), where s 
runs through the set N. Similarly the nt)i column of A is defined to 
be the restriction of the function A to the domain of pairs (t, n), where 
t runs through the elements of the set M. We note that the mth row of 
a matrix is a row vector and that the nth column is a column vector. 
With these conventions matrices can be thought of as sets of rows or as 
sets of columns, and addition of matrices is simply addition of corre- 
sponding rows or columns of the matrices involved. Furthermore, the 
k-nth entry in the matrix product of A and B is the product of the kth 
row of A by the nth column of B and is of the form 2meM ^m/m? where tt 
is a row vector and / is a column vector. That is, propositions about 
matrix multiplication, when proved entry-by-entry, may sometimes be 
proved by considering only the product of a row vector and a column 
vector. 

Because of the correspondence of row vectors to rows and column 
vectors to columns, we shall agree to call the domain of a row vector or 
a column vector the elements of a single index set. 

Connected with any definition of multiplication are five properties 
which may or may not be valid for the structure being considered. All 
five of the properties do hold for the real numbers, and we state them 
in this context: 

(1) Existence and uniqueness of a multiplicative identity. The real 
number 1 satisfies cl = Ic = c for every c. 

(2) Commutativity: ab = ba 

(3) Distributivity: a{b + c) — ab + ac 

{a + b)c = ac + be 

(4) Associativity: a(bc) — (ab)c 

(5) Existence and uniqueness of multiplicative inverses of all 
non-zero elements. 

We can easily settle whether the first two properties hold for matrix 
multiplication. First, the identity matrix I plays the role of the 
multiplicative identity, and the identity is clearly unique. Second, 
commutativity can be expected to fail except in special cases because it 
is not even necessary for the index sets of two matrices to agree properly 
after the order of multiplication has been reversed. 
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The validity of the third property, that of distributivity, is the 
content of the next proposition. 

Proposition 1-2: If B, and C are matrices and if AB, AC, and 
AB + AC are well defined, then A(B C) = AB AC. Similarly 
(D + E)F = DF + EF if DF, EF, and DF -h EF are all well 
defined. 

Proof: We prove only the first assertion. We may assume that A is 
a row vector n and that B and C are column vectors / and g. Then 

Trf + ‘rrg = 2 ‘^mfm + 2 

meM meM 

meM 

~ 2 ^rnifm 9rr^ 

meM 

= + g)- 

The fourth and fifth properties are related and nontrivial. Associa- 
tivity does not always hold, but useful sufficient criteria for its validity 
are known. For an example of how associativity may fail, let ^ be a 
matrix whose index sets are the non-negative integers and whose entries 
are given by 




whereas 

{VA)^ = 1 . 

All the products involved are well defined, but the multiplications do 
not associate. 

We shall not consider the problem of existence of inverses, but 
uniqueness rests upon associativity. For suppose A = BA = AC = 
CA = I. Since AC = /, we have B{AC) = B, and since BA = I, we 
have {BA)C = C. Therefore, B = C ii and only if B{AC) = {BA)C. 
With this note we proceed with some sufficient conditions for 
associativity. 
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Lemma 1-3: Let be a sequence of real numbers nondecreasing 
with i and with j. Then lim^ limy ft^y = limy lim^ ft^y, both possibly 
infinite. 

Proof: In the extended sense lim^ ft^y = Ly exists and so does 
limy 6^y = Now [Lj] is nondecreasing, for if Lj > then for i 

sufficiently large fr^y > Ly + ^ > 6i.y + fc, which is impossible. Similarly 
{Lj*} is nondecreasing, so that limy Ly = Land lim^ = L* exist in the 
extended sense. If L ^ L*, we may assume L* > L and hence L is 
finite. Then there exists an i such that > L. Hence 

Li* > L 

> Ly for all j 

> bij- for all j. 

Thus 6jy is bounded away from its limit on j, a contradiction. 

Following the example of Lemma 1-3, we agree that all limits referred 
to in the future are on the extended real line. 

Proposition 1-4: Non-negative matrices associate under multi- 
plication. 

Proof: Since we are interested in each entry separately of a triple 
product, we may assume that we are to show that n{Af) = (ttJ)/, 
where tt > 0, A > 0, / > 0, tt is a row vector, / is a column vector, and 
the index sets are subsets of the non-negative integers. Then 

^i^f) = 22 '^m^mnfn 

m n 

and 

~ 2 2 ^Tn^mnfrr 
n m 

Set 6^y = 2m = 0 2n = 0 '^m^mnfn 2n = 0 2m = 0 '^m"^mn/n 
1-3 to complete the proof. 

If A is an arbitrary matrix we define A ^ and ^ ” by the equations 

)mn “ max 0| 

)mn min ^Ayjyyi, 0|. 

Then A = A'^ — A~ , A'^ > 0, and A~ > 0. For row and column 
vectors, the matrices tt^, tt", f'^, and f~ are defined analogously. 
We note that if ^4/is well defined, then so are Af^ and Af~. Powers 
of matrices are defined inductively by A^ — 1, — A(A^~^). The 
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absolute value of a matrix . Proposition 1-4 now 

gives us five corollaries. 

Corollary 1-5: Matrices associate if the product of their absolute 
values has all finite entries. 

Proof: We are again to prove that Tr{Af) = [rrA)f, and we do so by 
setting 7T = 77^ — 7 t“, a = A~^ — A~ , and f = f'^ — /", applying 
distributivity, and using Proposition 1-4 on the resulting non-negative 
matrices. 

Corollary 1-6: Finite matrices with finite entries associate. 

Proof: The result follows from Corollary 1-5. 

Corollary 1-7: If .4 and B are non-negative matrices and /is a column 
vector such that A(Bf) and (.45)/ are both well defined, then A(Bf) = 
{AB)f. In particular, if C is a non-negative matrix, if > 0, and if 
C^f and C{C^~^f) are well defined, then C^f = 

Proof: Consider/'^ and/" separately and apply Proposition 1-4. 
For the second assertion, set A = C and B = 

Similarly one proves two final corollaries. 

Corollary 1-8: If B,C, and D are non-negative matrices such that 
either 

(1) ABD, AB, and BD, or 

(2) ACD, AC, and CD 

are finite-valued, then (.4(5 — C))D = A{{B — C)D). 

Corollary 1-9: If 41, 5, and C are matrices such that either 

(1) A has only finitely many non-zero entries in each row, 

(2) C has only finitely many non-zero entries in each column, or 

(3) 5 has only finitely many non-zero entries, 

and if (AB)C and A{BC) are well defined, then 

(AB)C = A(BC), 

Some of these conditions are cumbersome to check, but there is a 
simple sufficient condition. Suppose that we write a general product 
as Yli= 1 (^i — with > 0 and 5^ > 0. If all the 2^ products 

^1^2 • • • ^1^2 • • • ^TP • • • > ^1^2 • • • 




8 



Prerequisites from analysis 



are finite, then we see from Proposition 1-2 and Corollary 1-5 that we 
may freely use distributivity and associativity. 

The effect of matrix multiplication on matrix inequalities is sum- 
marized by the next proposition, whose proof is left to the reader. 

Proposition 1-10: Matrix inequalities of the form A > B ov B < A 
are preserved when both sides of the inequality are multiplied by a non- 
negative matrix. Inequalities of the form A > B ov B < A are 
preserved when both sides are multiplied by a positive matrix, provided 
the products have all entries finite. 

Next we consider the problem of ‘‘block multiplication” of matrices. 
The picture we have in mind is the following decomposition of the 
matrices involved in a product: 

Ml A^IB^ B^ /(7i (72\ 

\^3 aJ\b, bJ " U cj 

More specifically, let K, M, and N be index sets and let M', and N\ 
respectively, be non-empty subsets of the index sets. Impose orderings 
on Kj M, and N so that the elements of K', M', and N' precede the 
other elements, which comprise the complementary sets R', M', and 
R'. Let A, B, and C be matrices such that 

( 1 ) ^ is defined on K and Jf, 

( 2 ) J5 is defined on M and N, and 

(3) AB — C is well defined. 

Let matrices Aj^, ^4 be defined as the restriction of the 

function A to the sets 

( 1 ) K' and M' for Aj^ 

(2) K' and M' for A^ 

(3) K' and M' for A^ 

(4) R' and M' for A^. 

Pictorially what we are doing is writing A as four submatrices with 




We perform the same kind of decomposition for B and C and obtain 
Ml A^\'B, B^\ _ /Cl C2\ 

Us ^ JUs Bj ” Us cJ 

The proposition to follow asserts that the submatrices of B, and G 
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multiply as if they were entries themselves. Its proof depends on the 
fact that matrix multiplication is defined independently of any ordering 
on the index sets. 



Proposition 1-11: + ^ 2^3 = 

"I” “ ^2 

^3-®l "^4-®3 “ ^3 

^ 3^2 + -^4^4 “ ^4* 



Proof: We prove only the first identity since the others are similar. 

l)ij ^ij ~ 2 ^ ^ 

msM meM' meM' 

— + (^ 2 ^ 3 ) 1 ;* 



Notice that if the submatrix has at least one infinite index set, 
then the representation of A by 



is not the standard one 



A 




^10 ^11 



The ordering on the index sets of A is not of the same type as that of 
the non-negative integers. We recall once more, however, that the 
fundamental properties of matrices are independent of any orderings on 
the index sets. It is only the representation of a matrix as an array 
which requires these orderings. 

Limits of matrices play an important part in the study of denumer- 
able Markov chains. We shall touch only briefly at this time on the 
problems involved. 

Definition 1-12: Let be a sequence of matrices. We say that 

A = lim;c-oo exists if A^^n = exists for every m and 

n. 



Notice that limits of matrices are defined entry-by-entry. No 
uniformity of convergence to the limiting matrix is assumed. 

The type of problem that arises is as follows. Let tt be a row vector 
and let be a sequence of column vectors converging to a column 
vector /. Is it true that { 77 /^^^} necessarily converges to 77 / ? The 
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answer to this question is in the negative unless some additional 
hypothesis is added. What is being attempted is an interchange of the 
order of two limit operations — one from the series which defines 
and the other from the limit as k tends to oo. Such an inter- 
change can be justified only under special circumstances, and we shall 
obtain later in this chapter some sufficient conditions as special cases of 
theorems of measure theory. 

2. Measure theory 

Let X be an arbitrary non-empty set of points and let ^ be a family 
of subsets of X. We say that is a field of sets if 

(1) the empty set 0 is in 

(2) whenever ^ is a set of the complement of A, denoted A, is in 

and 

(3) whenever A and B are sets of so is their union, denoted 
A^ B. 

A field of sets is called a Borel field if it has the additional property 
that whenever A^ e ^ for n = 1, 2, 3, . . . , so is i A^. 

The intersection of sets A and B is indicated hy A n B, and the 
difference A n 5 is denoted A — B. From the above definitions the 
reader can easily establish the following result. 

Proposition 1-13: If is a field of sets, then contains 0 and X 
and is closed under complementation, finite unions, finite intersections, 
and differences. If is a Borel field, then is closed under de- 
numerable intersections. 

Proposition 1-14: For any class of sets ^ of the points of a set X, 
there exists a unique smallest Borel field containing 

Proof: The family of all subsets of X forms a Borel field containing 
Form the intersection of all Borel fields which contain ^ and call 
the resulting family of sets Let A be in then A is in all Borel 
fields containing ^ and so is A. Hence A is in A similar argument 
applies to intersections and denumerable unions. Thus ^ is the 
smallest Borel field containing 

Definition 1-15: A function p from a field of sets ^ to the extended 
real number system is called a set function. If p{A) > 0 for every A 
in p is said to be non-negative. If p(A 'U B) = p(A) -f p{B) when- 
ever A and B are in and A C\ B = 0 , p is said to be additive. 

Suppose A^ is in ^ for n = 1, 2 , 3 , . . ., and suppose A^ C\ Aj = 0 
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whenever i # j. If p(Un= i -^n) = 2“= i Pi^n) holds whenever IJ"= ^ 
is a set of then p is said to be completely additive. In discussing 
set functions, we shall assume that there are no two sets A and B in 
^ such that p{A) = +oo and p{B) = — oo, and we shall assume that p 
is not identically infinite. 

An additive set function p has the properties that 

(1) P{0) = 0, 

(2) p(Un=i pi^n) ^OT disjoint sets {AJ, and 

(3) p{A u 5) + p{A n B) = p(A) + p(B). 

If p is non-negative and additive and if A is contained in B, then 
p(A) < p(B). To see this, set C = B n A so that A and C are disjoint 
and A ^ C = B. Then p(A) + p(C) = p(B) by additivity, and the 
result follows at once. We shall now establish two facts about 
completely additive set functions. 

Proposition 1-16: Let p be an additive set function defined on a field 
of sets Let {A^} be a sequence of sets in such that C A 2 C • • • , 
and suppose A = U^=i is in If p is completely additive, then 
lim^_oo p(^n) = p(^)' Conversely, if lim^^oo p(^n) = p(^) 
such sequences, p is completely additive. 

Proof: Set B^ = A^ and B^ = A^ n A^-i^ Then A^^ = U2=i 
disjointly, and by additivity p(A^) = 2k=i p(Bjc)- But A = {Jk=i Bk 
and by complete additivity p[A) = 2^=ip(-S/c)- The proof of the 
converse is left to the reader. 

A consequence of this proposition is the following: 

Corollary 1-17: Let p be an additive set function defined on a field 
of sets in such a way that p(A) < 00 for every A. Let be a 
sequence ofsets in such that 3 A 2 3 Ag D • • • andQ^^i = 0- 
If p is completely additive, then lim^_^oo pi^n) = Conversely, if 
lim„_oo pi^n) = 0 for all such sequences, then p is completely additive. 

A non-negative completely additive set function on a field of sets 
is called a measure. The set of points X with a measure defined on its 
field is called a measure space. We shall usually denote measures by 
jjL or V. If there is no ambiguity about what measure is involved, we 
shall frequently refer to X by itself as the measure space. 

If A is a measure space with field of sets ^ and measure pu, then X 
is a set in and we define /x(A) to be the total measure of the space. 
A probability space is a measure space of total measure one. 
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We give four examples of measure spaces. 

(1) Let X be any set, let = {0, X), and define /x(0) = 0 and 
1 ^{X) = a > 0. Then X is the trivial measure space. 

(2) Let X be Euclidean n-space, let ^ be the Lebesgue measurable 
sets, and let /x be Lebesgue measure (the natural generalization of 
length, area, or volume). 

(3) Let X be the set of six possible outcomes for tossing a die. 
Assign weight ^ to each of the six points in the space, and for any subset 
of X assign as a measure the sum of the weights of the points in the 
set. Then ^ is the family of all subsets of X, and A is a probability 
space. 

(4) Let X be a denumerable index set, and let tt be a non-negative 
row vector with X as its index set. Assign as a weight to each point 
of X the value of the corresponding entry of tt. For any subset of X 
assign as a measure the sum of the weights of the points in the set. 
Then ^ is the family of all subsets of X, and X is a measure space 
with total measure ttI . 

The sets of a field on which a measure /x is defined are called the 
/x-measurable or simply the measurable subsets of X. In the construc- 
tion of a measure on a field, it is possible for a non-empty set A to be 
assigned measure zero. In example (2) above, for instance, every 
denumerable set and even certain uncountable sets are sets of measure 
zero. Suppose ^ is a subset of such a set A. If ^ is measurable, then 
/x(5) > 0 since /x is a measure. But 

ix(B) < fx(A) = 0 

since B C A and A is of measure zero. Thus, a measurable subset of a 
set of measure zero is of measure zero. But there is no reason why such 
a set B has to be measurable. However, one can agree to add all 
subsets of sets of measure zero to a field and extend the resulting family 
of sets to the smallest field containing the family. Such an extended 
field is called an augmented field. It consists precisely of all sets of the 
form {C — D) Kj E, where C is a set in the original field and D and E 
are subsets of a set of measure zero. Therefore the augmented field of 
a Borel field is again a Borel field. Note that in any augmented field 
every subset of a set of measure zero is measurable and has measure 
zero. In later chapters of this book all fields will be augmented. 

If a statement about the points of a measure space X fails to be true 
only for a set of points which is a subset of a set of measure zero, we 
say that the statement holds for almost all points of X or that it is true 
almost everywhere (abbreviated a.e.). 
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Proposition 1-18: Let /x be a measure defined on a field of sets 
If [A^ is a sequence of sets in if A is in and if ^ C (J^ then 

n 

Proof: Write = A„ - (Ufc=i A^). The sets are disjoint in 
pairs, and consequently the sets A O B„ are also disjoint. Further- 
more, Un -Bn = On SO that 

^ J n (U 

- U (-1 n B.y 

n 

By hypothesis /x is a measure. It is therefore completely additive and 

1^{A) = 2 t^{A n Bn) 

n 

< 2 since A n B„ C B„ 

n 

^ 2 since B^ C A„. 

n 

To conclude this section we shall establish a result known as the 
Extension Theorem. The proof follows the proof of Rudin [1953]. 

Theorem 1-19: Let be a field of sets in a space X and let v be a 
measure defined on Suppose X can be written as the denumerable 
union of sets in ^ of finite measure. If ^ is the smallest Borel field 
containing then v can be extended in one and only one way to a 
measure defined on all of ^ which agrees with v on sets of 

Before proving the theorem, we need some preliminary lemmas and 
definitions. The property in the statement of the theorem that X is 
the denumerable union of sets of finite measure is summarized by saying 
that V is sigma-finite. 

Let y be a measure defined on a field of sets ^ in a space X, and 
suppose X = IJ",, 1 A^ with A^^e ^ and v{Aj^) < oo. For each subset 
B of X, define ix{B) = inf {2 v{B^)}, where the infimum is taken over all 
denumerable coverings of B by sets of 

Lemma 1-20: The set function fx is non-negative. If A and B are 
subsets of X such that A C B, then /x(^) < /x(J5). If C is a set in 
then /x((7) = v(C). 
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Proof: We see that /x is non-negative because /x is the limit of non- 
negative quantities. If A C B, then fJi(A) < fi(B) because every 
covering of £ is a covering of A. Let C be in Then {(7} is a cover- 
ing of C and /x((7) < v{C). And for any covering 

V{C) ^ 2 
n 

by Proposition 1-18. Therefore, 

v(C) < inf 2 v{Cn) = 

n 

Lemma 1-21 : If [A^ is an arbitrary sequence of subsets of X and if 
A = Un^n. thenia(^) < 2nM^n)- 

Proof: Let e > 0 be given. Let with i = 1, 2, 3, . . . be a 

denumerable covering of A^ such that B^^^ is in and ^ 

fji{Aji) -h e/2^. This choice is possible by the definition of /x. Then 
since all the ^’s form a covering of we have 

A k 

^ 2 ^ 

n 

and the assertion follows. 

We define a set theoretic operation 0 for subsets of X by 
A @ B = {A r\ B) KJ {B r\ A). 

The set ^ 0 is called the symmetric difference of A and B. A point 
is in ^ 0 ^ if it is in ^ or 5 but not both. We leave the details of the 
proof of the next lemma to the reader. 

Lemma 1-22: The subsets of a space X form a ring under the opera- 
tions 0 and n with additive identity 0 and multiplicative identity X. 
Every set is its own additive inverse. 

Define a distance d between subsets of X by d{A, B) — fji{A 0 B). 
We note that d has the properties 

d{A,A) = /x(0) = 0 

and 

d{A, B) = ix{A @ B) = d(B, A). 

Since 



we have 



A u B = {A @ B) yj B, 
A C {A @ B)\J B 
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and by Lemmas 1-20 and 1-21 

/x(^) < d{A, B) 4- f^(B). 

Replacing A hj A @ B and B hy C @ B, we obtain the triangle 
inequality 

d{A, B) < d(A,C) -h d{C, B), 

Lemma 1-23: For any subsets A^, ^y 

d({Ai U A^, (Ri u B^) < d{A-^, B^) -h d(^2> -^ 2 ) 

d{{A^ n ^ 2 ), (^1 n R 2 )) ^ d{A^, B,) + d(A^, B^) 

d(B,A) = d(B, A). 

Proof: We prove only the first and third assertions. First we 
observe that {A^ U A 2 ) © (^i U R 2 ) ^ (^1 © ^ 1 ) ^ (^2 © ^ 2 )- 

suppose X E {A-^ U A2) © (^1 U B2). We may assume without loss 

of generality that g U ^2 but x ^ B-^\J B2- If x e A^, then 
X ^ B^ so that X E A-^@ B^. Similarly if a; g A 2 , then x g ^2 © ^2 
and the containment is established. The first assertion of the lemma 
now follows by applying Lemmas 1-20 and 1-21. For the third part, 
we have 

B@ A ^ {A r\ B)kj {I r\ B) 

and 

B @ I = {I r\ B)kj (A r\ B) 

SO that 

B@A = B@A. 

Definition 1-24: Convergence of sets in measure is defined by saying 
that u4 if lim^_^^ d{A^, A) = 0, Let be the collection of all 

subsets ^ of X for which there exists a sequence {^n} sets in ^ 
having the property that A^^^ A, Let be the family of denumer- 
able unions of sets in 

Lemma 1-25: If [A^^] and {R^} are sequences of sets in such that 
Aj^-> A and B^ -> B, then A^^KJ B^^ A KJ B, A^^ C\ B^^^ A r\ B, 
and A^ -> A, Therefore, is a field of sets. For any (7^ -> (7, 
lim„ ix{C„) = ,x(C). 



Proof: Since by Lemma 1-23 

d{{A^ U £J, {A u B)) < d(A^, A) + d(B^, B) 
d{(A, n £„), (A n B)) < d{A,, A) + d{B,, B) 
d(A^, A) = d(I„, A), 
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we have A^\J A 'U B, A^C\ B^-> A n B, and A^ -> A. The 
limit of /x((7yi) is established by the inequalities 

/x(CJ<d(C„C) + /x(C) 

and 

/i(C') < d{Cn, C) + iM{Cn)- 
Lemma 1-26: /x is additive on 

Proof: Let A and B be disjoint sets in and pick {A^} and {jB^} 
in ^ such that A^-^ A and B^^ B. Then since p is additive on 
and since /x agrees with v on sets of we have 

fji(A^ u + fJi(A^ n = fi(A^) + /x(5J. 

By Lemma 1-25, 

fji(A u B) + fji(A n B) = fji(A) + (i{B) 
or 

yi{A Kj B) = yi{A) + /x(jB). 

Lemma 1-27: If ^ A^ with A in and {A^} a sequence of 

disjoint sets in then /x(^) = 2n /^(^n)- 

Proof: Since A D [A^ u ^2 ^3 ^/c)^ we have, by Lemma 

1-20, fji(A) > {jl(A^ U ^2 by Lemma 1-26 the right 

side equals 2n = i i^(^n) each k. Hence 

2: 

n 

and equality holds by Lemma 1-21. 

Lemma 1-28: If A is in and if fx{A) is finite, then A is in 

Remark: If A is in J^*, then fji{A) is not necessarily finite. 

Proof of lemma: Write A = Un with [ A ^ in J^*, A ^ disjoint 
sets, and set B^ = = i A^. Then 

d[A, B„) = t^([A n S,] u [A r^ j5J) 

= 40 aX 

\k=n+l 1 



00 

= 2 
fc = n + 1 



which by Lemma 1-27 
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Since the last expression on the right is the tail of a convergent series, 
we have A. Since we can find in ^ such that 

d(Lj^, Bji) < l/n. Then d(Lj^, A) < Ijn + d{Bj^, A), and hence > A, 
Thus A is in 

Remark: If ^ g with iJi(A) finite, then A is in hence for every 
e > 0 there is a, B in ^ such that fji(A @ B) < e. Conversely, if there 
exists such a B^ for any e, then Bn^^ ^ so that A g and, a 
fortiori, A g These observations give a characterization of the sets 
A in for which ijl(A) is finite. 

Lemma 1-29: /x is completely additive on and is a Borel field. 
Proof: Suppose 

A = U^n 

n 

is the union of disjoint sets {A^} in Then ix(A) > iJi{An) for every n 
by Lemma 1-20, so that we may assume /x(^n) < ^ for every n. The 
complete additivity of fi now follows from Lemmas 1-28 and 1-27. For 
the proof that is a Borel field, we see clearly that is closed under 

denumerable unions. It remains to be proved that is closed under 

complementation. Since v is sigma-finite, let 

^ = U^n 

n 

with Aj^ in ^ and fJi(Aj^) = v(Ajj) < oo. Let B in be given and 
suppose B = with B^^ in Since 

A^nB = {J(A^r\ Bj,) 

k 

and since A^ n Bj^ is in J^*, n 5 is in But by Lemma 1-20, 

fi{A^ r\ B) < fx{A„) 

and we have assumed that /x(^n) < oo. Thus by Lemma 1-28, 
An r\ B E and since is a field, 

n (Z - (A^ nB)) = A^r^S 

is in Therefore 

S = X n S = \J (A^n S) 

n 

is in ^*, and the proof is complete. 

We are now in a position to prove the Extension Theorem. 

Proof of Theorem 1-19: Existence of the extension of y to a measure 
fjL defined on is proved by Lemmas 1-20 and 1-29. Since, by 
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Lemma 1-29, is a Borel field containing contains The 

extended measure restricted to sets of ^ has the desired properties. 
For uniqueness, suppose /x' is another measure on ^ that agrees with v 
on Since, by sigma-finiteness, X is the union of sets in ^ of 
finite i^-measure, we may assume that X is a disjoint union of sets of 
finite measure by letting — Ufc<n Let C be any set in 

we want to show /x'((7) = /x(C). By definition, 

y.{C) = inf|2v(C'„)} 

with the infimum taken over covers {C„}, where is in For any 
fixed cover we have 

< M'(UCn) ^ = 2*'(C'n)- 

Therefore 

ILc'(C') < inf »'(<^n)} = y^(G). 

Writing 

^'(C) = 

n 

we see that it is sufficient to show that 

ti-'iC n n £„). 

But 

ii'iC n B,) + i,'(C o = v{B,) = ,J,{C n 5„) + i.{0 n fi„). 

Now we know that /x' is dominated by /x: 

11 .' (C n Bn) < fl{C n Bn) < H-iBn) < 00- 

If 

/x'(C n Bn) < fl{C n Bn), 
we obtain the contradiction 

,x'(C n Bn) + ix'(C n Bn) < !X{C r\ Bn) + n 5„). 

3. Measurable functions and Lebesgue integration 

Let be a Borel field of sets in a set X. The measurable sets of X 
are the sets of 

Definition 1-30: Let / be a function with domain X and with range 
the extended real number system. The function / is said to be a 
measurable function if for each real number c the set [x \ f(x) < c} is 
measurable. 
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The content of the next proposition is that the property f{x) < c may 
be replaced by any of the conditions f(x) < c, f(x) > c, or f(x) > c. 
Therefore, if / is a measurable function, then the set 

{x\c < fix) < d} 

is measurable; either or both of the signs < may be replaced by <, 
and the set is still measurable. 

Proposition 1-31: The following four conditions are equivalent: 

(1) {x \ f{x) < c} is measurable for every c. 

(2) {x I f(x) < c} is measurable for every c. 

(3) [x I f{x) > c] is measurable for every c. 

(4) {x \ f{x) > c} is measurable for every c. 

Proof: From 

{x 1 fix) < c] = I /(^) < 

{x 1 fix) > c} = X - {x I fix) < c}, 

{x I fix) > c} = |x I fix) > c - 

and 

{x I fix) < c} = X - {x I fix) > c}, 

we see that (1) implies (2), that (2) implies (3), that (3) implies (4), and 
that (4) implies (1). 

Proposition 1-32: Every constant function is measurable. 

Proof: li f(x) = a identically, then {x e X \f(x) < c} is either 0 
or X. 

In analogy with our procedure for matrices in Section 1, we define 
/+ and/- by 

f^{x) = max {/(a:), 0} 
f-(x) = -mm{f(x), 0}. 

Proposition 1-33: If /is measurable, then so are/‘^,/“, and |/|. 

Proof: I ^ I / ^ c} u {a: | 0 > c} 

{x \ f~ > c} = {x\f < -c} U {a; I 0 > c}. 
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The set (a; | 0 > c} is either 0 or X and is therefore measurable. For 
I/I we have 

{x I 1/1 < c} = {x I -c < /(x) < c). 

Proposition 1-34: Let / and g be measurable functions whose values 
are finite at all points. Then / -j- g and/-gr are measurable. 

Proof: We prove only the assertion about / + ^. Order the 
rational numbers and call the ntln one r^. Then 

{x \ if + g){^) > c} = U ({:» I /(^) > c + r„} n (x I g(x) > -r„}), 

n = 1 

so that f g is measurable. 

Corollary 1-35: If/ is measurable, then so is c/for every constant c. 

Proposition 1-36: Let {/^} be a sequence of measurable functions. 
Then the functions 

(1) sup/„(x) 

(2) i^/„(x) 

(3) lim^sup /„(x) 

(4) liminf/„(x) 

n 

are all measurable. 

Proof: The assertions follow from the observations that 

C30 

{X I SUp/„(x) > C} = U I /n(^) > C}, 

n-1 

I inf/„(x) < c} = U i A(^) < c}, 

n = 1 

lim sup/„(x) = inf sup/„(x) 

n n m>n 

and 

liminf/„(x) = supinf/„(x). 

n n m>n 

The supremum of finitely many functions is their pointwise maxi- 
mum. Therefore the maximum and minimum of finitely many 
measurable functions are both measurable. 

Corollary 1-37: If {f^} is a sequence of measurable functions and if 
/ = lim,j /^ exists at all points, then / is measurable. 
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We shall give three examples of measurable functions. 

(1) Let ^ be the family of sets on the real line which are either finite 
unions of open and closed intervals or complements of such sets. 
Then ^ is a field of sets. Let ^ be the smallest Borel field containing 

All continuous real-valued functions are measurable with respect 
to the Borel field 

(2) Let X be a space for which ^ is the family of all subsets of X. 
Then every function / defined on X is measurable. 

(3) Let X be the union of a sequence of disjoint sets and let ^ 
be the family of all sets which are unions of sets in the sequence. 
Then a function / is measurable if and only if its restriction to the 
domain is a constant function for each n. In particular, if ^ = 
{X, 0 }, then the measurable functions are the constant functions. 

Let A be any subset of X. The characteristic function of A, denoted 
is defined by 

f 1 if X € A 

Xa(x) = ^ ^ . 

lO otherwise. 

A function that takes on only a finite number of values is called a 
simple function. It may be represented, uniquely, in the form 

(*) « = 2 

n = 1 

where the are the distinct values the function takes on and the sets 
AJ^ are disjoint. The simple function is measurable if and only if all 
of the sets A^, A 2 , . . . , are measurable. 

Proposition 1-38: For any non-negative function / defined on X, 
there exists a sequence of non-negative simple functions {5^} with the 
property that for each x e X, {5n(^)} is a monotonically increasing 
sequence converging to f{x). If/ is measurable, the {5^1} may be taken 
to be measurable. 



Proof: For every n and for \ < j < set 



and 

Then 



< f(x) < 

Bn = [x\ f(x) > n}. 

^ 2 ” — 1 
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increases monotonically with n to /. If / is measurable, then so are 
{Anj} and thus s^ is measurable. 

If /X is a measure defined on a Borel field ^ of subsets of X, we denote 
the measure space by the ordered triple (X, /x). In (X, /x) let E 

be a set of the family and suppose 5 is a non-negative, measurable 
simple function, represented as in (*) above. Since s is measurable, 
is measurable and /x(^„ n E) is defined for every n. Set 

N 

n = 1 

For any non-negative measurable function /, define the Lebesgue 
integral of / on the set E with respect to the measure /x by 

fdiJL = sup Ie{s), 

Je 

where the supremum is taken over all simple functions s satisfying 
0 < s < f. We note that the value of the integral is non-negative 
and possibly infinite. It can be verified that if 5 is a non-negative 
measurable simple function, then 

J sdfM = I^(s). 

If / is an arbitrary measurable function, then by Proposition 1-33, 
and J^/~d/x are both defined. If the integrals of/’^ and /“ 
are not both infinite, we define the integral of / by fd^ = Je; 
j^f~dfji. The function / is said to be integrable on the set E if 
and /“cZ/x are both finite. 

Following our examples of measure spaces and measurable functions, 
we give three examples of integration. A fourth example will arise in 
Chapter 2. 

(1) Let ^ = {0,X} and suppose /x(X) = 1. Only the constant 
functions are measurable and 

cdfjL = 0; cdyi = c. 

J 0 Jx 

(2) Let X be the real line, let ^ be the Borel field of sets constructed 
in the first example of measurable functions, and let /x be Lebesgue 
measure. Continuous functions are measurable, and it can be shown 
that the value of the Lebesgue integral of a continuous function on a 
closed interval agrees with the value of the Riemann integral. More 
generally one finds that every Riemann integrable function is Lebesgue 
integrable, but not conversely. 
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(3) Let X be the denumerable set of points described in Example 4 
of measure spaces, and let tt be a non-negative row vector defined on X, 
Then tt defines a measure on X. If / is an arbitrary column vector 
defined on X, then / is a function on the points of X. Furthermore, / 
is measurable since all subsets of X are measurable sets. The reader 
should verify that the integral of / over the whole space X with respect 
to the measure tt is the matrix product tt/ and that the condition for the 
integral to be defined is precisely the condition for the matrix product 
to be well defined. Because of this application of Lebesgue integra- 
tion, we often speak of column vectors as functions and non-negative 
row vectors as measures. We shall return to this example in Section 5 
of this chapter. The proof of the next proposition is left to the reader. 



Proposition 1-39; The Lebesgue integral satisfies these seven 
properties : 



(1) If c is a constant function. 



I 



cd/x = Cfji{E). 



(2) If / and g are measurable functions whose integrals are defined 
on E and i{f{x) < g{x) for all x e E, then 



I /d/x < gdix. 
Je Je 



(3) If/is integrable on E and if c is a real number, then c/is integrable 
and Jg c/d/x = c /d/x. 

(4) If/is measurable and fx(E) = 0, then /d/x = 0. 

(5) If E' and E are measurable sets with E' C E and if/is a function 
for which /d/x is defined, then fdjju is defined. In particular. 



and 



f < f /-d/x 

JE' JE 

f f-dfx < f /-d/x. 
JE' JE 



(6) If \f(x)\ < c for all X E E, if fjb{E) < 00 , and if / is measurable, 
then / is integrable on E. 

(7) If/is measurable on E and if |/| < g for a function g integrable 
on E, then / is integrable on E. 



Corollary 1-40: If / is a non-negative measurable function with 
Jg fdyi = 0, then / = 0 a.e. on E, 
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Proof: The subset of E where /(x) > \jn must have measure zero 
since otherwise / would have positive integral by (1) and (2) of Prop- 
osition 1-39. The set where / ^ 0 on .B is the countable union on n of 
these sets. 

4. Integration theorems 

We shall make frequent use of four important facts about the 
Lebesgue integral. We develop these results as the four theorems of 
this section. 



Theorem 1-41 : Let / be a fixed measurable function and suppose that 
fdyi is defined. Then the set function p{E) = fdpL is completely 
additive. 



Proof: If we can prove the theorem for non-negative functions, we 
can write / = / '^ — /“ and apply our result separately to /■*■ and/". 
We therefore assume that / is non-negative. We must show that if 
E = Un = i disjointly, then p{E) = 2n=i pi^n)- If/ is a character- 
istic function xa^ p{^) = \e Xa^P' = ^ the complete 

additivity of p is a consequence of the complete additivity of /x. If 
/ is a simple function, the complete additivity of p is a consequence of 
the result for characteristic functions and of the fact that the limit of a 
sum is the sum of the limits. Thus, for general / we have for every 
simple function s satisfying 0 < s < f, 

P CO P 00 

= 2 sd/i < 2 p(^n) 

J E fi = I J Eji n = 1 

by property (2) of Proposition 1-39. Hence 



I* 00 

p(E) = sup < 2 Pi^n 

S Je n^l 



We now prove the inequality in the other direction. By property (5) 
of Proposition 1-39, p(E) > p{E^) for every n since f = . Thus if 

p(B„) = +00 for any n, the desired result is proved. We therefore 
assume p{Ej^) < oo for every n. Let e > 0 be given and choose a 
measurable simple function s satisfying 0 < 5 < /, 




fd>^ - I- 



and 
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This choice is possible by the definition of the integral as a limit. Then 

U E2) = fd^Ji > sdfjL = sdfji + sdfji 

*/ */ V £1 %/ E2 

EiyjE2 E\\jE2 

> r fdfi + r fdit. - € 

•/ El J E 2 

= p{E-^) -f ^(-£'2) — 

Hence 

p{E^ u E 2 ) > p(E^) + piE^). 

By induction, we obtain 

p(E^ KJ E^) > p{E^) + . . . + p{E^) 



and 

Hence 



p{E) > p{E^) + • • • + p{E^) for every n. 
p{E) > 2 p{En)- 



The proofs of two corollaries of Theorem 1-41 are left as exercises. 
These results use only the additivity of the integral and not the 
complete additivity. 

Corollary 1-42: If /is measurable, if l^fdp^ is defined, and if 

p{E © i^) = 0, 
then fdpL is defined and fdpu = fdpL. 



Corollary 1-43: If fdpu is defined, then 





If / is integrable on E, then so is |/|. 



Let / and g be two functions whose integrals on E are defined. 
Suppose the set ^ = {x e E \ f(x) / 9 ^( 0 :)} is of measure zero; that is, 
suppose / and g are equal almost everywhere on E. Writing E = 
Akj {E r\ A), we find by applying Corollary 1-42 twice. 




j fdp.= J 

EnA Er^A 



gdfi = 
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Functions which diflfer on a set of measure zero thus have equal 
integrals. Therefore, when we are thinking of a function /defined on X 
in terms of integration, it is sufficient that / be defined at almost all 
points of X. And, if we agree to augment Borel fields of sets by adjoin- 
ing subsets of sets of measure zero, we see that if / and g differ on a 
set of measure zero and if / is measurable, then g is measurable. With 
the convention of augmenting the field, we obtain from Corollary 1-37, 
for example, the result that if {/^} is a sequence of measurable functions 
such that 

f{x) = lim /„(a:) 

n-*oo 

almost everywhere, then / is measurable. Similarly the theorems to 
follow would be valid with convergence almost everywhere if the 
underlying Borel field were augmented; the necessary modifications in 
the proofs are easy. 

We now state and prove the Monotone Convergence Theorem, which 
is due to B. Levi. 



Theorem 1-44: Let ^ be a measurable set, and suppose is a 
sequence of measurable functions such that 

^ ~ /i ^ A ^ * * * 

and 

f{x) = lim f^{x), 

n-+ 00 

Then 




Proof: Since the increase with n, so do the Therefore 




exists. Since /is non-negative and is the limit of measurable functions, 
we know that fdy. exists, and since A ^ /> we have 

f frAix < f /(i/X 

JE Je 



for every n. Therefore, k < fdjx. Let c be a real number satisfying 
0 < c < 1, and let 5 be a measurable simple function such that 
0 < s < f. Set 

= {xeE \ f„(x) > cs(a;)} 
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so that El C E 2 C Eq C • • . Then E — 1 E^. For any n we 

have 

k > fndfJL > fndjJL > C sdfJL. 

J E */ Ef^ J Eji 

Since the integral is a completely additive set function (Theorem 1-41), 
we have by Proposition 1-16 

lim c sdfjL = c sdfx. 
n-^oo Je^ Je 

Thus, as 7^ -> 00 , 

k > c j sdfjL. 

Letting c 1, we have k > sdfx, and taking the supremum over all 
s, we find k > fdfx. 



Proposition 1-45: Suppose h = f + g with / and g integrable on E. 
Then h is integrable on E and 



I hdfjL = I fdfji -h I gdfjL, 

JE JE JE 



Proof: We first prove the assertion for / and g non-negative. For 
simple functions the assertion is trivial. If / and g are not both simple, 
apply Proposition 1-38 to find monotone sequences of non-negative 
measurable simple functions {^, 1 } and {u^ converging to / and g. Then 
{ 5 ^ = converges to h, and since 



s^dfi = 

JE JE JE 



the result follows from Theorem 1-44. Next, if / > 0, ^ < 0, and 
k = f + g > 0, have f = h { — g) with A > 0 and ( — gr) > 0, so 
that 



or 



fd^ = hdfx -f {-g)dfi 

JE JE JE 

I hdfjL = I fdjjL -f I gd^jL. 

V E V E J E 



Since the right side is finite and since h > 0, h is integrable. For an 
arbitrary A > 0, decompose E into the disjoint union of three sets, one 
where / > 0 and > 0, one where / > 0 and g < 0, and one where 
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f < Oandg > 0. Theorem 1-41 then gives the desired result. Finally, 
for general h, write h = — h~ and consider h'^ and h" separately. 

Corollary 1-46: Let E he measurable set, and suppose {/J is a 
sequence of non-negative measurable functions with 

/(^) = 2 
n= 1 

Then 

Je n = l Je 

Proof: The result follows from Proposition 1-45 and Theorem 1-44. 

Proposition 1-47: If/ is a non-negative integrable function, then for 
every € > 0 there is a S > 0 such that, if fJi{E) < S, then 

f fdy. < e. 

Je 



Proof: Set = min (/, n). By Theorem 1-44, 

lim r f„dfx = I* fd/x. 
n Jx Jx 

Since / is integrable, we may find an N such that 

J* (/ ~ fN)dh^ < 2 

by Proposition 1-45. Let § = e/(2iV). If /x(J5/) < §, then 

f f if - fN)di^ + f fNdH' by Proposition 1-45 

Je Je Je 

< J (/ — f^)dfji 4- Nfx{E) by Proposition 1-39 



< e. 



Our third theorem for this section is known as Fatou’s Theorem. 

Theorem 1-48: Let E he measurable set, and let {/^} be a sequence 
of non-negative measurable functions. Then 



lim in^f^djJi < lim inf fndfJ^- 
Je n n Je 
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In particular, iif{x) = 




< lim inf 

n 




Proof: Setsr„(a:) = /^(x). Then 0 < g^ix) < g^ix) < ■ ■ •, and 

g^ is measurable on E. We have ^„(x) < /„(a:) and 

lim gr„(x) = lim inf fjx). 

n n 

By Theorem 1-44, 

lim inf f„dfi = lim g^dfi. = lim gj,^. 

Je n Je n n Je 

The result now follows from the inequality grid'll ^ Je 

The fourth basic integration theorem is the Lebesgue Dominated 
Convergence Theorem. 



Theorem 1-49: Let E he a> measurable set, and suppose {/J is a 
sequence of measurable functions such that for some integrable g, 
\fn\ ^ g for all n. If /(x) = lim„ /„(x), then lim„ exists and 





Proof: By property (7) of Proposition 1-39, /is integrable and so is 
for every n. Apply Patou’s Theorem first to -h ^ > 0 to obtain 

Ie (/ + ^ lim inf Je (fn + 9W or 

f /d/x < lim inf f /„d/x. 

JE n Je 

Apply the theorem once more to g — > 0 to obtain 

i ig - fW ^ lim inf f {g - f„)dij, 

Je n Je 

- r fdiJL < lim inf f ( -/„)d/x 
Je n Je 

fdfi > lim sup /„dp. 

Je n Je 

Therefore lim^ /n^i^ exists and has the value asserted. 



or 



or 
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Corollary 1-50: Let £ be a set of finite measure and suppose {/„} is a 
sequence of measurable functions such that |/„| < c for all n. If 
f{x) = lim„ /„(x), then fdfj. = lim„ 

Much of the discussion of this section has dealt with the following 
problem: A sequence of integrable functions /„ converges a.e. to a func- 
tion /; we want to be able to conclude that J tends to J fd^x. First 

we should note that at almost all points and 
hence it is sufficient to check the convergence of the integral separately 
for these two sequences. Thus we may consider the case/^ > 0 alone. 
For non-negative functions Fatou’s inequality is the only general 
result; one cannot conclude equality without a further hypothesis. 
Two sufficient conditions are given by monotone and dominated 
convergence. 

But if we restrict our attention to a space of finite total measure, then 
we can give a necessary and sufficient condition. 

Definition 1-51: A sequence {/^} of non-negative integrable functions 
is said to be uniformly integrable if for each e > 0 there is a number k 
such that 

< 6 

{X\fn(.X)>k) 

holds for every n. 

Equivalently we may require that the inequality holds for all suffi- 
ciently large n. For suppose it holds for n > N and for the number c. 
Then since each is integrable, there is a k^^ depending on n (and, of 
course, e) such that 

< e 

{X\fn(X)>kn) 

and we may choose k = sup {k^, k2, . . . , k^, c}. 

Proposition 1-52: If {/^} is a sequence of non-negative integrable 
functions tending to / and if fji(E) < oo, then f^dfji -> fdfi if and 
only if the {/^} are uniformly integrable. 

Remark: The sequence {/^} need converge to / only almost every- 
where for the proposition to be valid, provided /is assumed measurable. 
This measurability condition is always satisfied if the underlying Borel 
field is augmented. 
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Proof: We shall use the notation for the function truncated at k\ 
that is, = inf (f(x), k). Let 

= f - f frAtJ- 

JE Je 

B>‘ = f fd^- f rdtx 

Je Je 

C/ = f - f 

Je Je 

Dn'" = f fndfJ- - ( fn^'‘'’dlJ- = \ if n ~ k)dfX. 

Je Je J 



We have 



{X\fn>k} 

A, = 



Clearly increases monotonically to /, so that tends to 0 by 
monotone convergence. Since /, f^^\ But/^^^^ is bounded 

by k. Thus, on the totally finite measure space E, lim,^ = 0 by 
Corollary 1-50. Hence by choosing a large k and then a sufficiently 
large n (depending on k), we can make the first two terms on the right 
side as small as desired. If the functions are uniformly integrable, 
then we can find a large k (perhaps larger than the one already chosen) 
for which will be small for all n. Hence, for all sufficiently large n, 
the left side is small. Thus the integrals converge. 

Conversely, suppose that 0. Then there is a A; such that for 

all sufficiently large n, < e/2. Thus for all n sufficiently large, 
we have 

«/2 > J (/n - 

{X\fn>k} 

“ J (/n “ 

{X\fn>2k) 

— 2 J 

{X\fn>2k} 



since fn — k > on the set in the last two integrals. Taking 2k as 
the number in the equivalent definition, we see that we have uniform 
integrability. 



5. Limit theorems for matrices 

We have already said that if tt is a row vector and if is a sequence 
of column vectors converging to /, then it is not necessarily true that 
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^^ik) converges to tt/. Our object in this section is to obtain sufficient 
criteria to justify saying that nf = lini;c-oo 

In the examples of Lebesgue integration, we noted that non-negative 
row vectors are measures and column vectors are functions. Functions 
are integrated by forming the matrix product of the measure and the 
function. Thus, the theorems of Section 4 immediately give us the 
following four results. In each of them it should be borne in mind 
that: 

(1) There is a corresponding result if row vectors are thought of as 
functions and column vectors are thought of as measures. 

(2) These results imply corresponding results about matrices which 
are not just row or column vectors. (Recall the discussion in 
Section 1 about proving matrix equalities entry -by-entry.) 

Proposition 1 - 53 : Let tt > 0 be a row vector and suppose is a 

sequence of column vectors converging entry-by-entry to / and 
satisfying 

0 < < /< 2 ) < . . . . 

Then tt/ = lim;^ 

Proof: This result is a restatement of the Monotone Convergence 
Theorem as it applies to matrices. 

Corollary 1 - 54 : Let tt > 0 be a row vector and suppose is a 

sequence of non-negative column vectors such that 

/ = i 

fc=i 

Then 

-/ = S 

fc = l 

Proof: This corollary is immediate from Corollary 1-46. 

Proposition 1 - 55 : Let tt > 0 be a row vector and suppose is a 
sequence of non-negative column vectors. Then 

7r(lim inf/^^^) < lim inf 
k k 

If / = lim;^ exists, then nf < lim inf^^ 

Proof: This is Fatou’s Theorem. 




1-57 



Limit theorems for matrices 



33 



Proposition 1-56: Let tt > 0 be a row vector such that tt! is finite. 
If is a sequence of column vectors such that < c1 and 

/ = lim^ exists, then 

7t/ = lim 

k 

Proof: The result follows from Corollary 1-50. 

A harder problem arises with a sequence of non-negative row vectors 
77 ^^^ converging to a row vector tt. It is not sufficient for < M 
and I/I < c1 in order for tt/ = lim^c For set 

TT^l) = (1 0 0 0 . . .) 

7t<2) = (0 1 0 0 . . . ) 

7r^3> = (0 0 1 0 . . . ) 

and take / = 1. Then tt = 0 so that tt/ = 0, while lim^^. = 1. 
We give two sufficient conditions for 

tt/ = lim 77^^/; 
k 

our integration theorems do not provide us with quick proofs, however. 
The second proposition is closely related to the Silverman-Toeplitz 
Theorem on summability methods. 

Proposition 1-57 : If { 77 ^^^} is a sequence of non-negative row vectors 
converging to 77 , if / is a column vector such that 0 < / < c1 for some 
c, and if 77 I = lim^ with 77 I finite, then 

77 / = lim 77^^/. 
k 

Proof: Take / as a measure and { 77 ^^^} as a sequence of non-negative 
functions and apply Patou’s Theorem. We have 

77 / < lim inf 77^^y. 

With c1 — / as a measure and { 77 ^^^} as a sequence of functions, Patou’s 
Theorem gives 

77(c1 — /) < lim inf 7T^^\c^ — /). 

Since 77 I is finite and lim = 77 I , we find 

— Trf< lim inf ( — 77 ^^/) 
or 

77 / > lim sup 77^^y. 
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Proposition 1-58: Let be a sequence of row vectors converging 
to 7T and satisfying < M. Suppose / is a column vector with 

the property that for any S > 0 only finitely many entries of / have 
absolute value greater than S. Then 

7t/ = lim TT^^y. 

k 

Proof: The entries of / are clearly bounded, say by c. Numbering 
the entries, we have for every N 

k/ - »*>/l s i k, - <W,\ + Z (l-,l + kS“l)l/,l- 

y=i j>N 

Let e > 0 be given. Choose N sufficiently large that |/y| < e/4if for 
j > N. Choose k sufficiently large that |7Ty — Trf^\ < ej2cN for 
j < N, Then [tt/ — < e, and the result is established. 

As we noted in Section 1 , results about general denumerable matrices 
can be reduced to results about row and column vectors. In particular, 
the propositions of the present section apply equally well to sequences 
of the forms and 



6. Some general theorems from analysis 

In this section we collect a variety of results from analysis which we 
shall need in later chapters. We prove only some of them. At first 
reading the reader may find it wise to skip this section, returning to it 
later as the material is required. 

a. Stirling’s formula. Stirling’s formula gives an asymptotic 
expression for m! The approximation is 

I /t; 

ml — — VzTrm, 

gm ’ 



where the symbol ^ indicates that the ratio of the two quantities tends 
to one as m increases. For a proof, see Courant and Hilbert [1953], 
pp. 522-524. Stirling’s formula immediately gives an approximation 

for the binomial coefficient for large n. The coefficient is 



defined as 



a! 

bl(a - b)\ 



Lemma 1-59: For r > 1, 




c — 7 = 



Vn \{r 



f Y 



where c is a constant depending on r but not on n. 
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The multinomial coefficient I . ^ is defined to be ,, , — r;- 

\6, c, . . . , a) Old • • • a! 

b. Difference equations. An nth order linear difference equation 
with constant coefficients is an expression of the form 

Vk + n ^n-l2/fc + n-l ^iVk + l ^oVk “ 

where and are functions defined on the integers and where the 
Cn-u • • - j Cq are complex numbers. The equation is homogeneous if 
= 0 and nonhomogeneous otherwise. For a nonhomogeneous 
solution, we refer to any single function satisfying the equation as a 
particular solution, and we call the set of functions satisfying the same 
equation with = 0 the homogeneous solution. The totality of solu- 
tions to any difference equation is known as the general solution. 

Proposition 1-60: Every solution of the difference equation 

yu^n + (^n-lVk + n-l + • • • + Co?/fc = 0 

is a linear combination of n fixed functions, obtained as follows: If a 
is a root of multiplicity q of the characteristic equation 

+ h -f Co = 0, 

then q of the functions are 

ka^, k^a^, . . . , k^~^a^. 

Conversely, each of these functions is a solution of the difference 
equation. Furthermore, each solution of the equation 

Vk + n + ^n-lVk + n-l + ' ’ * + ^oVk ~ 
is the sum of a fixed particular solution and some solution of 

l/fc + n + (^n-lVk + n-l + • • • + = 0. 

Conversely, every such sum is a solution of the nonhomogeneous 
equation. 

For a proof of this proposition, see Hildebrand [1956], pp. 202-203. 

c. Cesaro summability and Abel summability. Let {a^} be a sequence 
of real numbers. Define to be the arithmetic mean of the first n 
terms of the sequence {a„}. The sequence {6^} is called the sequence of 
Cesaro averages of the sequence {a„}. The sequence is said to be 
Cesaro summable if its sequence, {b^}, of Cesaro averages has a limit. 
If is a sequence of matrices, the sequence of Cesaro averages 
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is defined entry-by-entry: is the Cesaro average of A\y, 

A\]\ The basic fact about Cesaro summability is the following 
proposition. 



Proposition 1-61: If a sequence (aj converges to a limit L, then the 
sequence of Cesaro averages {6„} converges and its limit is L, 



Proof: Let / be the column vector whose nth entry is — L, and 
let be the row vector defined by 



Then 



Now = 

tending to 0. 
lim„ = L, 



^(n) ^ for 1 < j < n 

^ \o for n > j. 

K = + L. 

1, lim^ = 0, and / is a column vector with entries 
Hence by Proposition 1-58, lim,i = 0. Therefore 



The converse of Proposition 1-61 is false. The sequence {a„} defined 
by CL 2 n — ® 2 n + i = ^ does not Converge, but it is Cesaro summable. 

Let be a sequence of real numbers. (In most applications the 
partial sums Cq + • • • H- c„ are assumed bounded.) If the limit 

00 

lim 2 

‘>1 n = 0 

exists, the limit is called the Abel sum of the series 2 fbe series 

is said to be Abel summable. Abel’s Theorem is the following result. 

Proposition 1-62: If the series 2 converges to a finite limit L, then 
it is Abel summable and its Abel sum is L. 



Proof: Since the partial sums converge to a finite limit, the are 
bounded and 2 converges absolutely for f < 1. Let {tj^} be any 
sequence of positive reals less than one and increasing to one. Let / 
be the column vector whose nth entry is (Cq + • • • + c^) — L, and let 
be the row vector defined by 



Then 



__ (1 — , 

n<ic)f = 2 - L. 



Now | 7 T ^^^|1 = 1 , lim;^ = 0, and / is a column vector with entries 
tending to 0. Hence by Proposition 1-58, lim = 0. That is. 



lim 2 

fc-«oo ^ 



= L. 
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This equality for every such sequence implies that 

lim y = L. 

til ^ 

d. Convergent subsequences of matrices. A bounded sequence of 
real numbers has a subsequence which converges to a finite limit. We 
shall obtain a generalization of this result to matrices. 

Proposition 1-63: Let be a sequence of matrices with the 

property that for some pair of real numbers c and d, cE < < dE 

for all n. Then there exists a subsequence of matrices which 

converges in every entry. 

Proof: Since there are only denumerably many entries in each 
matrix, they can be numbered by a subset of the positive integers. 
Select a subsequence which converges in the first entry. Let 

A^ 2 ^, A^ 2 ^, ... be a subsequence of A^^^, . . . which converges 

in the second entry. In general, let ... be a subsequence 

of . . . which converges in the mth entry. Finally set 

A(n,) = ^(v)^ 

Then converges in every entry. 

Corollary 1-64: Let be a sequence of matrices with the property 
that cE < A^'^^ < dE for all n. Then lim^ = A exists if and only 
if limy = A for every convergent subsequence 

Proof: The necessity of the condition is trivial. For the sufficiency 
suppose lim A\'j^ does not exist. Then lim inf A\]’^ < lim sup A\'j^. 
Pick a subsequence of that converges in the i-jth entry to 

lim sup and do the same for lim inf A^f, Apply Proposition 1-63 
to extract subsequences convergent in all entries from these sequences, 
and the result follows at once. 

e. Sets of positive integers closed under addition. The greatest 
common divisor of a non-empty set of positive integers is the largest 
integer which divides all of them. If the set consists of {n^, U 2 , . . .}, 
its greatest common divisor is denoted (n^, U 2 , . . .)• 

Lemma 1-65: If T is a set of positive integers with greatest common 
divisor d, then there exists a finite subset of T for which d is the 
greatest common divisor. 
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Proof: Let = k^d be an element of T, If = 1, {n-^ is the 

required set. If not, choose Wg such that \ Then n<^ = 

Ajgd with ig < ^ 1 * If {^i> ^ 2 } is the required set. Otherwise, 

find ^3 such that ^2^ f and set ^2, ^3) = k^d with k^ < ^ 2 * 
Continuing in this way, we obtain a decreasing sequence of integers 
^ 2 > • • • bounded below by 1. It must terminate, and then we have 
constructed the finite set. 

Lemma 1-66: Let T be a non-empty set of positive integers which is 
closed under addition and which has the greatest common divisor d. 
Then all sufficiently large multiples of d are in the set T, 

Proof: If d ^ 1, divide all the elements in T hy d and reduce the 
problem to the case d — 1. By Lemma 1-65 there is a finite subset 
{tii, . . ., nj of T with greatest common divisor 1. Then there exist 
integers Ci, . . . , with the property 

+ • • • + c^ris = 1 . 

If we collect the positive terms and the negative terms and note that T 
is closed under addition, we find that T contains non-negative integers 
m and n with m — n — Suppose q > n(n — 1). Then q = an b 
with a > n — I and 0 < 6 < 7 ^ — 1. Thus 

q = {a — b)n + bm 

and q is in T, 

f. Renewal theorem. The Renewal Theorem, one of the important 
results in the elementary theory of probability, can be stated purely 
in terms of analysis. 

Theorem 1-67: Let be a sequence of non-negative real numbers 
such that 2 /n = 1 /o = suppose the greatest common 

divisor of those indices k for which > 0 is 1. Set /x = 2 '^0 = 

and = 2fc = o '^kfn-k- If H' is infinite, then lim,^ = 0, and if [jl is 
finite, then lim,^ = l/fx. 

For a proof, see Feller [1957], pp. 306-307. 

g. Central Limit Theorem. Identically distributed independent 
random variables are defined in Sections 3-2 and 6-4. The mean of a 
random variable is its integral over its domain, and its variance is the 
integral of its square minus the square of its integral, a quantity which 
is always non-negative. 
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Theorem 1-68: Let (i/J be a sequence of identically distributed 
independent random variables with common mean /x and with common 
finite variance > 0. Set + • • • + Then for all real a 

and jS with a < j8, 

lim Pr[a < < jS] = ^(^3) - 0(a), 

n-+oo L oV m J 

where 

^ix) = ~^= f 

V^J-oo 

For a proof, see Doob [1953], p. 140. 




CHAPTER 2 



STOCHASTIC PROCESSES 



1. Sequence spaces 

We shall introduce the concept of a stochastic process in this chapter 
and develop the basic tools needed to treat the processes. Before the 
formal development, we shall indicate the intuitive ideas underlying 
the formal definitions. 

We imagine that a sequence of experiments is performed. The 
outcomes may be arbitrary elements of a specified set, such as the set 
{“yes,” “no,” “no opinion”}, the set (heads, tails), the set (fair, 
cloudy, rainy, snowy), or a set of numbers. The experiments may be 
quite general in nature, but we impose some natural restrictions: 

(1) The set of possible outcomes is denumerable. (This restriction 
is natural for the present book. It would be removed in a more 
general treatment of stochastic processes.) 

(2) The probability of an outcome for the nt\\ experiment is com- 
pletely determined by a knowledge of the outcomes of earlier experi- 
ments. Here “probability” is used heuristically, to motivate the 
later precise definition. 

(3) The experimenter is, at each stage, aware of the outcomes of 
earlier experiments. 

We shall first consider a sequence of n experiments, where n is 
specified at the beginning. Later we shall consider a denumerably 
infinite sequence of experiments. In each case we assume that the 
experiments do not stop earlier. However, this is an unimportant 
restriction; a process that terminates may be represented in our 
framework by allowing the outcome “stopped.” The following are 
examples of such sequences of experiments: 

( 1 ) A sociologist wishes to find out whether people feel that television 
is turning us into a nation of illiterates. He asks a carefully selected 
sample of subjects and receives the answer “yes,” “no,” or “no 
opinion” in each case. 
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(2) A gambler flips a coin repeatedly, recording heads” or “tails.” 

(3) A meteorologist records the daily weather for 23 years, classifying 
each day as “fair,” “cloudy,” “rainy,” or “snowy.” 

(4) A physicist tries to determine the speed of light by making a 
series of measurements. (Since each measurement is recorded only to 
a certain number of decimal places, the possible outcomes are rational 
numbers, and hence the set is denumerable.) 

(5) A physicist makes a count of the number of radioactive particles 
given off by an ounce of uranium. A measurement is made every 
second, and the outcome of the nth measurement is the total number of 
particles given off until then. 

The exact way in which probabilities are determined from an experi- 
ment is a deep problem in the philosophy of science, and it will not 
concern us here. We will assume that the nature of the experiment 
yields us certain probabilities, namely the probability that the nth 
experiment results in an outcome c, given that the previous experiments 
resulted in outcomes Cq, Cj, . . . , We then design a probability 

space in which one can compute the probability of a wide variety of 
statements concerning the experiments and in which the specifled 
(conditional) probabilities turn out as given. 

The elements of our probability space Q are sequences of possible 
outcomes for the experiments (either sequences of length n, for a flnite 
series of experiments, or infinite sequences). The elements of the Borel 
field ^ of measurable sets will be the truth sets of statements to which 
probabilities are to be assigned. (The truth set of a statement about 
the experiments is the set of all those sequences in Q for which the 
statement is true.) A measure [x is constructed, and the probability 
of a statement is the measure of its truth set. In particular, fJi(Q) = 1 
in a probability space, and hence the probability of a logically true 
statement is 1. 

Let us first consider the case where n experiments are performed. 
The possible outcomes are conveniently represented by a tree, with 
each path through the tree representing a sequence of possible out- 
comes. In the diagram n equals 3, Q has 8 elements, and ^ consists 
of all subsets of Q. 

The numbers on the branches, known as branch weights, represent 
the conditional probabilities mentioned above. For example, jp 2 is the 
probability of heads, given that the first toss came up heads, while 
1 — Pg is the probability that tails is the outcome that follows two 
heads. The weight assigned to the path HHT is taken to be the 
product of the branch weights, Pi'P 2 {^ ~ Pz)- The measure /x(A) of a 
set of paths A is the sum of the weights of the paths in A. In the usual 
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EXPERIMENT NO. 0 12 3 

setup for coin tossing, each p is and the weights of the paths are 
each 

The branch weights may be arbitrary non-negative numbers, but the 
sum of the weights of branches starting from a given branch-point must 
be one. 

After we define conditional probabilities, it will be easy to verify 
that the numbers written on the branches do indeed turn out to be the 
desired conditional probabilities (see Kemeny, Mirkil, et al. [1959], 
Chapter 3). 

Let us suppose that we have constructed a tree Qj^ for a series of n 
experiments. We consider k additional experiments, obtaining a tree 
^n + k' We wish our probabilities to be consistent in the sense that a 
statement p about has the same probability when computed in 
either measure space. Our method of computing measures has this 
consistency property. This assertion is easily verified for k = I, and 
the result follows by induction. 

In constructing an infinite tree Q for a sequence of experiments, our 
measure is required to have the property that a statement about the 
first n outcomes has the same probability as if computed with respect 
to the finite tree 13^. (Of course, the same probability may be com- 
puted with respect to Qn + ki fhe result is the same by consistency.) 
This convention assigns probabilities to many simple statements. We 
can then show that the probability of a much larger class of statements 
is uniquely determined. We will now consider the problem abstractly. 

Let aS be a denumerable set; S is called the state space. Let Q be the 
set of all infinite sequences of elements of S. A typical element a> of 
13 is represented as 

^ “ (^0? ^25 * • • )j 

where Cq, C 2 , . . . are elements of S. The points cd of 13 are called 
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paths, the whole space Q is called a sequence space, and the value 
on a path co is called the ?^th outcome on co. The function 
defined from to aS by 

^n(^OJ • • • 5 ^nJ • • • ) ~ 

is called the nth outcome function, and the nth outcome is said to occur 
at time n. 

Let be the family of all unions of sets in Q of the form 

{(X) I Xq(co) eSq a Xi(co) GaSi A • • • A Xn{co) G aS J, 

where Sq, S^, . . are subsets of the state space S. (Notice that the 
sets of arise from the class of all subsets of the tree described 
above.) It is clear that for each n, is a Borel field. Let ^ be the 
family of sets defined by 

^ = U ^n. 

n = 0 

Each set in is a set of paths for which a finite number of outcomes 
are restricted to lie in certain subsets of S. All other outcomes are 
unrestricted. The reader should verify that is a field of sets. In 
Section 3 we shall see that is not a Bdrel field; in the meantime, we 
let ^ be the smallest Borel field containing (Proposition 1-14). 
After we have defined a measure on the Borel field ^ which we are 
seeking will be the augmented field obtained from ^ by adding subsets 
of sets of measure zero. 

The sets of are known as cylinder sets. If a cylinder set (7 is a 
set in we note that C may be written as the denumerable union 

C = \J 

i 

of (disjoint) basic cylinder sets 

= {a; I Xo(co) = Co A X:^(oj) = Ci A • • • A Xj,{oj) = cj. 

A basic cylinder set in is the set of all possible continuations of a 
single path in We let v(B/^^) equal the product of the branch 

weights on this path in 13^. 

Recalling that the probability measures we assigned to 13^ were 
defined consistently, we can show that v is uniquely defined on the sets 
of It has the properties that v(I3) = 1 and that the restriction of 
V to is a measure for every n. 

We will next show that v can be extended to a measure fi on the 
smallest Borel field containing This result will be a consequence of 
Theorem 2-4. First we prove a series of lemmas. In each case 
and V are as defined above. 
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Lemma 2-1 : Let v be a set function defined on .^ = IJ in such a 
way that the restriction of v to is a measure for every n. Then 
V is non-negative and additive. 

Proof: Non-negativity is trivial. For additivity, let A and B be 
disjoint sets in Then A e 3^^ and B e say. Since the are 
nested, A and B are both in where r = sup (m, n). Since v is a 
measure when restricted to 

v(A Kj B) = v{A) + v(B), 

We shall in fact establish that v is completely additive, a result due 
to Kolmogorov. 

Lemma 2-2: Suppose Cq D D C 2 D • • ' is a sequence of sets in ^ 
such that Cji e 3^^ and lim^ ^{Cn) > Then for every m there is a 
basic cylinder set B^^^ in such that 

(1) lim v(Cr, n > 0 

(2) £>> C 

Proof: By complete additivity of v on where r = sup (m, n), we 
have v{Cj^) = ^ with v{C^ n monotonically de- 

creasing in n. Then 

0 < lim v{Cj^) = lim ^ ^ = 2 ^ B/^'^). 

n n j j n 

The interchange of limit and sum is justified by dominated convergence 
as follows: The functions of j, namely v((7^ n satisfy 

V(C„ n £/’")) < v((7o n 

and we know that 2; ^ B/^^) = v(Cq) is finite since v{Q) = 1. 

Thus, since a denumerable sum cannot be positive unless one of the 
terms is positive, we have for some j = i 

lim v(Cr, n > 0. 

n 

Hence (1) is satisfied. But the terms in the sequence v(Cj^ n 
monotonically decrease to a positive limit, so that 

v(C„ n > 0 . 

Now Cfn G is thus the union of basic cylinder sets. Since 

n cannot be empty, we must have C C^. 
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Lemma 2-3: Suppose Aq D . is a sequence of sets in ^ such 
that Aj^ G and lim„ H^n) = L > 0, Then there exists a sequence 
of basic cylinder sets such that 

(!') is a basic cylinder set of 

(2') lim v(A^ n > 0. 

m 

(3') For n > 0, C 
(4') C A^. 



Remark: Property (3') indicates that we are actually constructing a 
single path by adjoining branches one at a time. 



Proof of Lemma: The construction is by induction. For n = 0 
apply Lemma 2-2 to the sequence {A^} and the integer m = 0. The 
that the lemma gives is Property (2') follows from (1), 

(3') is vacuous, and (4') follows from (2). Suppose that we have 
constructed B^^\ . . . , B^^^; we want Let 



n _ k < n 

^ n for k > n. 

The sequence of sets {Cj^} is decreasing and Cj^ g we have 



lim v{Cj^) > 0 

k 



by (2') of the inductive hypothesis. Applying Lemma 2-2 to [Cj^ and 
the number m = n + 1, we obtain a basic cylinder set which 

we take as By (2), 



Hence (1'), (3'), and (4') hold. By (1), 

0 < lim v{Cj^ n + = lim v{Aj^ n B^^'^ n + 

k k 

= lim v(Aj, n + by (3'). 

k 

Hence (2') holds. 



Theorem 2-4: Let be a sequence space, let be the Borel field 
of sets generated by all statements 

and let Suppose for every n there is a probability 

measure defined on with the property that the restriction of 

^n + i is Vj^. Let V be the set function on ^ whose restriction to 

is Vn for all n. Then i/ is a measure. 
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Proof: By Lemma 2-1 and the contrapositive of the second half of 
Corollary 1-17, it is sufficient to show that if^o ^ ^ ^2 ^ • - is 

a decreasing sequence of sets in ^ with lim^ = L > 0, then 

A ^ 0 . Since every set in is a set in ^n + i» we may assume 

that An ^ n by repeating the same set several times in the sequence 
and by adding, if necessary, the set Q a finite number of times at the 
beginning of the sequence. The hypotheses of Lemma 2-3 are satisfied. 
We then have = H. where cu is a single path of Q. For every 

71, 60 G C An\ hence co g A^ and ^ 0 . 

Applying Theorem 1-19 we extend the measure v defined on 
uniquely to a measure /x defined on the Borel field Augmenting 
the field we obtain the Borel field ^ with which we shall work. 

The extended measure /x is called tree measure and satisfies fi(Q) = 1. 
This completes our construction of the sequence space (Q, /x). 

From now on when we refer to a sequence space we shall mean (Q, SS, fx) 
constructed in the indicated manner. 

2. Denumerable stochastic processes 

We turn now to the definition of a denumerable stochastic process. 
After the definition we shall show that every sequence space defines in 
a natural way a stochastic process and that every denumerable sto- 
chastic process can, in a way to be described shortly, be considered as a 
sequence space. 

Let S he Sb denumerable set, which will be called the state space, and 
let (Q, fx) be a probability space. Points of Q will be denoted a>. 

Definition 2-5: Let {/^} be a sequence of functions with domain Q 
and range in S, and let be a sequence of Borel fields. The pair 
{fn, ^n) is called a denumerable stochastic process on Q if these two 
conditions are satisfied: 

(1) ^0 C C ^2 ^ • • • ; every n, C 3 ^. 

(2) For every fixed n and for each 5 g the set (co | /^(cd) = 5} is a 
set in 3 ^ n- 

The second condition in the definition is a measurability requirement 
on/„. If we were to think of the family of all subsets of aS as a Borel 
field, our condition would be equivalent to the demand that the 
inverse image under of any set in ^ be a set in 

First we shall show that every sequence space defines a stochastic 
process in a natural way. Let {Q, 3 S, fx) be a sequence space. We 
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take the outcome functions as the sequence of functions and {J^n} 
as the sequence of Borel fields. It is clear that 

[oj I X^{o}) = c] 

is a set of and that ^ C • • • C C ^ C The pair 
therefore is a stochastic process and is referred to simply as 
{x^. We have thus shown that the outcomes of a sequence of 
experiments with a denumerable range form a denumerable stochastic 
process. 

When we begin to discuss Markov chains, we shall wish to confine 
ourselves to stochastic processes defined on a sequence space. There- 
fore, our second task is to show that the restriction to sequence space is 
actually no restriction at all; this will indicate that our treatment of 
denumerable Markov chains is completely general. 

Let (/^, be a denumerable stochastic process on Q' with state 
space S. Let fj,' be a measure on Q'. We shall construct a sequence 
space (Q, (jl) in such a way that the behavior of the process (/^, 
on Q' may be studied completely by studying the stochastic process 
{x^} on Q, 

Define Q to be the space of all sequences of elements of S, and let ^ 
be the Borel field obtained by the construction in Section 2-1. We 
shall establish a correspondence between paths of Q and subsets of Q', 
and we define a measure fx on 

The correspondence we choose is 

^ ^ I /o(^') = ^o(^) A • • • A = ^n(^) A • • •}. 

To assign a measure to JQ, we must assign a measure to cylinder sets, 
such as 

A = {oj \ Xq(oj) GaSq a • • • a Xj^{oj) GaSJ. 

Noticing that the set A', defined by 

I A • • • A fn{(^') 

is a set in and is therefore /x' -measurable, we define 

fx(A) = fx'(A'), 

The measure /x can then be extended to a measure defined on all of 
and the construction of the space (Q, /x) is complete. We thus see 

that an arbitrary stochastic process defined on Q' may be considered 
as a process on a suitable sequence space Q in which the are out- 
come functions. The probability of any statement concerning the 
can be computed in the sequence space. 
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3. Borel fields in stochastic processes 

Probabilities are numbers assigned to statements about stochastic 
processes. We may now formally define the probability of a statement 
to be the measure of the statement’s truth set. In symbols Pr[p] = 
/x(P), where P = [o) I p}. If the set P is not a set in the Borel field 
on which is defined, then Pr[p] is undefined. Statements for which 
Pr[p] is defined are called measurable statements. 

In a stochastic process we see that a Borel field of sets represents a 
state of knowledge. The more sets there are in the Borel field, the more 
statements there are that we know how to assign probabilities to. Let 
us analyze briefiy what this fact implies about Definition 2-5. 

In a denumerable stochastic process we are given an increasing 
sequence of Borel fields such that for every n. The field 

represents the state of knowledge of the process up to time n. The 
fact that the Borel fields are increasing means that our knowledge of 
the process never decreases as time goes on, and the fact that all 
are contained in means that our total knowledge of the process 
necessarily includes knowledge of what happens in a finite number of 
steps. 

Similarly, condition (2) in the definition is an abstract formulation 
of the requirement that in a stochastic process the present does not 
depend upon the future. We conclude, therefore, that our definition 
satisfies the conditions imposed at the beginning of Section 1. 

We shall apply this insight about the role of Borel fields to a specific 
example to show that the field 3^ in Section 1 is not the same as the 
Borel field Let Q be the sequence space constructed when S is taken 
as a two-element set. Measures are assigned to each set B of 

paths of the form 

jB = {co I Xq = Co A • • • A = C^. 

This measure is eventually extended to a measure /x on the Borel field 
and we obtain /x(f3) = 1. The state space for this example is often 
taken as ^ = {heads, tails}, and the model for the stochastic process is 
the tossing of a balanced coin. 

For the coin-tossing process a well-known example of a statement 
whose truth set is not in the field ^ but is in the Borel field ^ is involved 
in the Strong Law of Large Numbers (which we shall consider in more 
detail in Chapter 3). Let k and n be integers and let be the fraction 
of the first n outcomes which are heads. Let p be the statement about 

that 

11 11 
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Consider the statement q about A; > 0 that 

We write the statement in this form to demonstrate that its truth set 
is in In words, the statement q asserts that for any A: > 0 there 
exists an N such that \i n > N, then — |1 < 1/A:. That is, q says 
that lim^_oo The truth set of the statement ^ is in ^ but not 

because the notion of limit cannot be expressed in terms of finitely 
many of the The Strong Law of Large Numbers asserts that 



Pr[g] = 1. 



4. Statements of probability zero or one 

In a probability space {Q, fi) a statement with truth set Q is 
logically true, whereas a statement with truth set 0 is logically false. 
However, the Strong Law of Large Numbers asserts not that a certain 
set is Q but that it has measure one. A statement whose truth set 
has measure one is said to be almost always, almost everywhere, almost 
surely true, or true a.e. Correspondingly, a statement with truth set 
of measure zero is almost surely false, and the negation of a 
statement p which is almost surely false is almost surely true. 

Two useful propositions are related to the subject of almost sure 
statements. 

Proposition 2-6: Let {p^^ be a denumerable set of statements and let 
q be the statement that p^ holds for all n. If Pr[j?^] = 1 for all n, 
then Pr[g] = 1. 

Proof: Let {P^} be the truth sets of the statements {p-^- Applying 
Proposition 1-18, we have 

1 - Pr[^] = - n Pn) = i^(y Pn) ^ 2 i^iPn) = 0 

so that 

Pr[g] = 1. 

The second result is one of the Borel-Cantelli Lemmas. 

Proposition 2-7: If [p^ is a sequence of statements for which 
2 Pr[2>n] is finite, then with probability one only finitely many of the 
Pn are true. 
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Proof : Let be the statement that all of the statements + 1> • • • 

are false. Let € > 0 be given. Choose N large enough so that 
2n = wPr[?>n] < €• Then 

1 — Pr[g^] = Pr[at least one of + . . . occurs] 

= V Pn + 1 V • • •]. 

By Proposition 1-18 the right-side is 

n = N 

Hence, 

Pr[finitely many are true] = Pr V S'nj 

> Pr[g^] 

> 1 - e. 

Since this inequality holds for every € > 0, the probability must be 1. 



5. Conditional probabilities 

Let {Q, p) be a probability space. If p and q are statements 
such that VT[q] ^ 0, the conditional probability of p given q, written 
Pr[^ I is defined by 

Vv[p I q\ = Pr[^ A ?]/Pr[^]. 

If Pr[g] = 0, we shall normally agree that Pr[p | g] = 0. (Alterna- 
tively, we might leave Pr[g? | g] undefined if Pr[g] = 0. Such a con- 
vention would be adopted in a more general context.) 

The case Pr[g] = 0 is not very interesting. Suppose Pr[g] # 0, and 
let Q C 13 be the truth set of g. If Q is taken as a space of points, then 
= [B' I Q n jB, G is a Borel field of sets in Q. For any 
set B' in we define a set function v by 



v{B') 



fj-(Q) 



The reader should verify that v is a measure on Furthermore, 
v{Q) = fji{Q) I fji(Q) = 1. Therefore, (Q, v) is a probability space. 
We may thus speak of the probabilities of statements relative to Q, 
and we see that their values coincide with conditional probabilities 
given g and relative to Q. That is, conditional probabilities possess 
the same properties as ordinary probabilities. 

We shall apply this notion of conditional probability to the sequence 
space considered in the preceding sections. In this space the sets of 
are denumerable disjoint unions of sets of the form 



£„ = (w I a;o e (So A • • • A (S„}. 
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Since the state space S is denumerable, each of the sets Sq, S-^, . . . , 
is denumerable and is the denumerable union of disjoint sets of the 
form {co I Xq = Co A • • • A = c„}. By definition of conditional 
probability, 



o 
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We have established the following result. 



Proposition 2-8: The measure on a sequence space is completely 
determined by 

(1) the starting probabilities, Vy[xq = Cq], and 

(2) the transition probabilities, 

Pr[x„ = c„ I Xo = Co A • • • A x„_i = 

Conditional probabilities, as we anticipated, have their place in tree 
diagrams. To each branch we may assign a conditional probability. 
We see now the abstract formulation of the fact that the probabilities 
of statements like Xq = Cq A = Ci a ••• A are computed 

simply by multiplying together the appropriate branch weights. 
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Two statements p and q are said to be probabilistically independent 
if Pr[p A g] = Pr[i>]Pr[g]. A stochastic process defined on sequence 
space is called an independent process if the statements 



and 



OOn = 



^0 — ^0 A 



A 



- 



are independent for all n > 1 and for all Cq, Cj, . . . , c^. Coin tossing is 
an example. We shall see that an independent process is a special 
kind of Markov process. If, in addition, for each c the probability 
that XJ^ — c does not depend on n, then the process is called an 

independent trials process. 



6. Random variables and means 

Let {Q, jjb) be a probability space. A measurable function f with 
domain Q and range in the extended real-valued number system is 
called a random variable. We may apply all the properties of the 
measurable functions in Section 3 of Chapter 1. 

If f(co) is an extended-real-valued function defined on the space Q 
of a stochastic process and if £2 has the property that for some n, f(o>) 
is measurable with respect to the Borel field then f is a random 
variable because C Such a function is said to be ^^-measurable. 
For the special case in which Q is sequence space, a function f is 
measurable for some n if its values depend only on a bounded number 
of outcomes. 

In terms of sequence space, we give two examples of random 
variables. 

(1) Define a function u/”^^ by 

U,(n)^P 

^ \0 otherwise. 

Since the value of u/^^ depends only on outcome n, the function u/^^ 
is .^^-measurable and is therefore a random variable. Letting n^ = 
0 and noting that the limit of a sequence of random variables 
is a random variable, we see that n^ is a random variable. The func- 
tion ny(cL») counts the number of times that the outcome j occurs on the 
path w] it will appear again after we introduce Markov chains. 

(2) Let tj be the time to reach j. That is, define tj(oj) to be the first 

time n along path oj such that = j- If j is never reached, set 

ty = -f 00 . Then ty is a random variable because it is the limit on n of 
the function ty^^^ = inf (ty, n), which is .^^-measurable. 
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Definition 2-9: A random time t is a random variable satisfying these 
two properties: 

(1) Its range is in the non-negative integers with { + oo} adjoined. 

(2) For each integer n the set {o> | t(o>) = n} is a set of 

The random variable ij defined in the second example above is an 
example of a random time. 

The mean of a random variable f, denoted M[f], is defined by 
M[f ] = fd/x, where M[f ] exists if and only if idji exists and where /x 

is the measure associated with the probability space. Since means are 
Lebesgue integrals, they satisfy all the properties of Lebesgue integrals. 
In particular, if {fj is a sequence of non-negative random variables, 
then by Corollary 1-46, 

m[ 2 d . 2 M[fj. 

Ln = 0 J n = 0 

In addition, if is a sequence of random variables with the properties 
that |g,j| < c for every n and that converges to g, then 

M[g] = lim M[g,] 

n -* 00 

by Corollary 1-50. An important application of these facts is the 
following result. 



Proposition 2-10: In a sequence space, 

M[n,] = 2 Ft[x, = j]. 

n = 0 



Proof: 



M[ny] = M 




= 2M[u/«>] 

n 




{colx„(a)) = y) 



IdfjL 



= 2 = j]- 



7. Means conditional on statements 

Suppose {Q, n) is a probability space on which a denumerable 
stochastic process is defined. We say that a denumerable family of 
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subsets ^ is a partition of Q if the sets Rj are disjoint, 

exhaustive, and measurable. Each such subset Rj is called a cell of the 
partition; we allow the possibility that a cell may have measure zero. 
The reader should notice that the sets { 0 , R^, R 2 , . . .} together with 
all possible denumerable unions form a Borel field which we shall call 
Since the sets in ^ are measurable, we see that C 
Two examples of partitions are typical. 

(1) Let the process be coin tossing, and define a partition by 

I Xq(o)) is a head}, R 2 = {o) \ Xq{o}) is a tail). 

More generally, let ^ consist of disjoint exhaustive measurable sets 
which are in for some fixed n, 

(2) Suppose f is a random variable whose range is denumerable. 
If the range is {a^}, define Rj = {co | f(co) = a^}. Then {i?y} is a 
partition. 

A denumerable set of statements {gj} about a stochastic process is 
said to be a set of alternatives if the truth sets Qj of the statements form 
a partition. Since the integral is a completely additive set function, 
we obtain, for every random variable f whose mean exists, the relation 



M[f] 



= 2 f 

i V Qi 



Let ^ be a statement with measurable truth set P. If f is a random 
variable and if Pr[p] ^ 0, then the conditional mean of f given p, 
written M[f | p], is defined by 



M[f 1 p-\ 



Pr[i>] 



J fd/Li. 



If Pr[^] = 0, then M[f | p~\ is defined to be zero. (In a more general 
setting, M[f | p~\ is not defined when Pr[^] = 0.) 

Proposition 2-11: If M[f] exists and {gj is a set of alternatives with 
truth sets {Q^}, then 

M[f] = 2Prto]-M[f|gJ. 



Proof: By definition of conditional mean, we have 
f frf/x = Prfe]-M[f|^J. 

jQi 

Summing both sides of the equation on i gives the result immediately. 
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Corollary 2-12: Let ^ be a statement with measurable truth set P, 
and let {gj be a set of alternatives with truth sets Q^. Then 

Pr[^>] = 2 Prfe]-Pr[?) | ^i] = 2 ^ S'*]- 

i i 

Proof: Let f be the characteristic function of the set P and apply 
Proposition 2-11. 



8. Problems 

1 . For coin tossing, show that 

(a) the probability of getting only finitely many “heads” is 0; 

(b) there will be infinitely many “heads” a.e. 

2. Consider the experiment of selecting “1” with probability “2” with 
probability or “3 ” with probability If this experiment is repeated 
infinitely often, show that the probability of selecting “1 ” only finitely 
often is 0. 

3. In Problem 2, let Which of the following /„ form a denumer- 
able stochastic process (/^, ? 

(^) /n = Ihe nth outcome. 

(b) = time at which nth “3” is selected. 

(c) fj^ = Sj^ = sum of the first n numbers selected. 

(d) fn = where a and c are constants. 

^ ^ Vcn 

4. Show that the following converse of the Borel-Cantelli Lemma is false: 
If 2 Pr[ jPn] = + 00 , then the probability that only finitely many are 
true is less than I. 

5. Start with 4 jacks and 4 queens from a bridge deck. Find the prob- 
ability of drawing two cards, both of which are jacks. Then compute 
the conditional probability of the same on each of the following 
conditions : 

(a) One card drawn is a jack. 

(b) One card drawn is a red jack. 

(c) One card drawn is the jack of hearts. 

(d) The first card drawn is the jack of hearts. 

6. For coin tossing, define three random variables which are not measurable 
with respect to any finite tree 

7. Let *ny(o>) be the number of times on path w that a j occurs before the 
first i occurs (if i = j, take *ny(oj) = 0). Prove that *hy is a random 
variable, and develop an infinite series representation for M[tf] in terms 
of it. 

8. In coin tossing, let t be the first time that “heads” comes up. 

(a) Is t a random time ? 

(b) Find M[t]. 

(c) Find M[t | first outcome is “tails”]. 
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9. In a randomly selected two- child family, let f = the number of boys. 
Find 

(a) M[f]. 

(b) M[f first child is a boy]. 

(c) M[f there is at least one boy]. 

10. For coin tossing, let f = the number of “heads” in the first three tosses. 
Let be the statement that there w^ere exactly i “heads” in the first 
two tosses (i = 0, 1, 2). Find M[f], M[f | go]> M[f | gj, and M[f | 
Verify that the first of these is a linear combination of the last three with 
appropriate probabilities as weights. 

11. Let {gj be a set of alternatives and let f be a non-negative random 
variable. Prove that if M[f | gi] > M[f ], then there is an alternative g^ 
such that M[f | g^] < M[f]. 

12. Let X be the unit interval. Let ^ consist of all finite unions of finite 
intervals (with or without endpoints). We classify a point as an interval 
of length 0. The measure to be constructed in this problem is called 
Lebesgue measure, and the Borel field is the class of all Borel sets. 

(a) Show that is a field. 

(b) Show that every set ^ in can be written as a finite disjoint union 
of intervals A 2 , . . . , A^. 

(c) If A is decomposed as in part (b), we define 

v{A) = 2 

k=l 

where I denotes length. Show that v is consistently defined and that 
V is a non- negative additive set function. 

(d) Show that if ^ is a finite union of intervals and if e > 0 is given, 
then there is a finihe union K of closed intervals and a finite union G 
of open intervals such that K C A, A C Gy and 

v(K) + € > v(A) > v(G) — €. 

[Note: An open interval here means a set which is the intersection of 
X with an open interval of the real line.] 

(e) Let A and A^, n — 1, 2, . . . , be finite unions of intervals with the Aj^ 
disjoint and with U A^ = A. Use parts (c) and (d) and the Heine- 
Borel Theorem to prove that, for any 6 > 0, 

v(A) < 2 v(An) + €. 

(f ) Deduce from part (e) that v is completely additive on 

(g) Apply Theorem 1-19 and describe the resulting measure space. 

(h) Using complete additivity, prove that a denumerable set of points 
has measure 0. 

(i) Why does the proof of (h) not show that every set has measure 0 ? 

(j) Show directly from the definition 

= inf [2 v(^,)] 

that every denumerable set of points has measure 0. 
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13. Let fx) be the unit interval with Lebesgue measure. If x e Q, let 

f(x) = x^. Let p be the statement “a: < 

(a) Show that / is a random variable. 

(b) Find M[/]. 

(c) Find M[/ I p] and M[/ | ^p]. 

(d) Relate your answer in (b) to the solution of (c). 



CHAPTER 3 



MARTINGALES 



1. Means conditional on partitions and functions 

In this chapter we consider a natural abstraction of the idea of a 
fair game in gambling. We shall give several applications of the basic 
result, the Martingale Convergence Theorem. 

We begin by defining what we mean by the conditional mean of f 
given a partition of the domain. Let {Q, ix) be a measure space. 
We shall normally assume /x(^3) < 1, but such an assumption is not 
necessary as long as /x is finite. 

Definition 3-1: Let oj e Q and let Tj be the statement that co is in a 
cell Bj of If f is a random variable, then the conditional mean of 
f given written M[f | ^], is defined to be a function of co whose value 
at every point in the cell Rj is the constant M[f | r^]. 

Next, we observe that if M[f ] exists and is finite, then M[f | 
exists and is finite. For, on cells of measure zero, the conditional mean 
is defined to be zero. If > 0, then 

Mi|t| I = M[|f| I ^ ^ 

The next proposition provides an equivalent definition of the mean 
of f given 

Proposition 3-2: M[f | <^] is characterized almost everywhere by 
these two properties: 

(1) M[f I is constant on each cell of 

(2) (df. = M[f I 

Proof: We must show that M[f | has these two properties and 
that any random variable satisfying these properties is equal to the 
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conditional mean of f given ^ a.e. Set g = M[f | ^], and suppose 
first that g is known to be a conditional mean. Then g satisfies (1) by 
definition. If fJi(Rj) = 0, both sides of (2), are 0. If not, equality 
again follows from the definition of M[f | 

Conversely, if a function g satisfies (1) and (2), then it agrees with 
M[f I on all paths in cells of positive measure; that is, it agrees with 
M[f 1 a.e. 

We give two examples of conditional means. 

(1) Let the stochastic process be coin tossing and let 

Ri = {oj \ Xq{o)) is a head} 
i ?2 = {co I ^o(^) is a tail). 

Let f be the total number of heads on the zeroth and first tosses. Then 

Mlf|aj = |* 

^ 1^1- on i ?2 

and 

M[f] = (M) + im) = 1* 

(2) For any stochastic process and for any denumerable- valued 
random variable f, let ^ be the trivial partition {13}. Then 

M[f I = M[f ] 

on every path o>. 

A partition ^ is said to be contained in 5^, written ^ C if every 
cell of ^ is a union of cells of 6^, If ^ C y, then ^ is the “coarser” 
subdivision, and y is called a refinement of 

Proposition 3-3: Conditional means satisfy these properties: 

(1) M[M[f 1 I ^] = M[f 1 if ^ C 5^. 

(2) M[M[f I y’]] = M[f] for any y. 

(3) If g is constant on each cell of then M[g | ^] = g a.e., and if 

M[fg] exists and is finite, then M[fg | = gM[f | ^]. 

(4) If f and g assume only finite range values and if M[f | and 
M[g I both exist and are finite, then M[f + g | ^] exists, is 
finite, and is given by 

M[f + g 1 ^] = M[f 1 -f M[g 1 
Proof: For (1) it is sufficient to show that 

f M[M[f I y] 1 m]dp, = f M[f I ^]rf/x 

V Rj •/ Rj 
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since both functions are constant on cells of Applying property (2) 
of Proposition 3-2 three times, we have 



f M[M[f 1 I = f M[f I 

JRj JRj 

= 2 J 

^ SiCRf 

= 2 J % 

= f fdfi 

JRj 

= r M[f \^]dfJL. 

JRj 



The proofs of (3) and (4) use the same technique and are left to the 
reader. Property (2) follows from (1) with ^ taken as the trivial 
partition {i3}. 



Definition 3-4 : The cross partition ^ 0 ^9^ of two partitions ^ and 
is the family of sets defined by 

^ 0 ^ == {i?. n Sj I Sj RiH Sj # 0 }. 

For example, 






/?®S 



Since the intersection of measurable sets is measurable and since the 
sets (jRf n Sj] are disjoint and exhaustive, a cross partition is a partition. 

In Example 2 of Section 2-7 we noted that every denumerable- 
valued random variable determines a natural partition of 12. We 
call the partition associated with the denumerable-valued random 
variable g, In terms of the natural partition induced by g, we 

define the conditional mean of f given g, by 

M[f 1 g] = M[f I 

Then M[f | g] is constant on sets where g is constant, and on the set 
where g = c for some constant c, M[f | g] has the value M[f | g = c]. 

Observing that the operation of forming the cross partition of two 
partitions is both associative and commutative, we define more 
generally 

I fo A • • • A fj = I ^^0 0 ... ® 




3-5 Properties of martingales ^ 

If p is a statement with truth set P, we define 

Pr[j) I fo A • • • A f„] = M[xp I fo a • • • a f„], 

where xp is the characteristic function of the set P. 

A sequence of random variables {f^^} is said to be independent if, for 
every n and for every A, 

Pr[f „+1 6 ^ I fo A • • • A fj = Pr[f„ + 1 e A\ 

almost everywhere. The reader should show that if (fj is a sequence 
of independent random variables, then 

M[fn^l I fo A . • • A fj = M[f,^l]. 

In the special case in which, for each A, Pr[f,j g A] does not depend on 
n, the random variables are said to be identically distributed. 

2. Properties of martingales 

With the background of Section 3-1, we proceed to define martingales 
and to give several examples of them. We still work with the prob- 
ability space {Q, /x) and a denumerable set of states S, We 

assume, however, that the set aS is a subset of thf xtended real number 
system. 

Definition 3-5 : Let {f^} be a sequence of denumerable-valued random 
variables, and suppose C C • • • is an increasing sequence of 
partitions of The pair (f,^, i® called a martingale if three 
conditions are satisfied: 

(1) M[|f„l ] is finite for each n. 

(2) fn is constant on cells of 

(3) f, = I 

If (1) and (2) are satisfied and (3) is replaced by f,^ > M[f,i + i | then 
(f,i, i® ^ supermartingale. If (1) and (2) are satisfied and (3) is 
replaced by f^ < M[f^ + i | then (f,j, i® ^ submartingale. 

For a martingale, condition (3) in Definition 3-5 implies condition 
(2), but for a supermartingale or a submartingale it does not. 

When we defined partitions in Section 2-7, we noted that every 
partition determines a Borel field Borel field 

satisfied ^ Since C ^n + i implies C ^n + i*> we see 
that a martingale is a stochastic process. The reader should notice 
that the condition that f,j is constant on cells of is equivalent to the 
condition that f,^ is measurable with respect to 

Throughout the discussion of martingales, it is convenient to keep in 
mind the idea of a fair game, which we shall introduce as our first 
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example below. We shall see that the fair game is a special case of 
the following situation. Let be a sequence of denumerable- 

valued random variables defined on a probability space such that 
M[|fy|] is finite. Setting (g) . . . (g) , we see that the 

sequence is clearly increasing. Therefore, only condition (3) of 

Definition 3-5 need be satisfied for (f^, to be a martingale. In 
particular, we see that such a sequence of random variables forms a 
martingale if and only if 

+ i I fo A A • • • A f„] = f,. 

When the partitions are obtained from the in the way we have 
just described, we agree to refer to the pair (f,^, simply as {f^}. 

We shall give three examples of martingales at this time. More 
examples will appear after we introduce Markov chains. 

(1) Let {y^} be a sequence of independent random variables with 
denumerable range and let = yo + • • • + y^ represent the nth. 
partial sum. Then {s^} with its partition obtained from the in the 
natural way is a martingale if and only if M[y^^] = 0 for every k. We 
have 

+ i I So A ■ • • A sj = M[y„ + i + s„ I So A • • • A sj 
= M[y„ + i I So A • • • A sj 

+ M[s„ I So A • • • A s„] 

= M[y„ + i I So A • • • A s„] + s„ 

= M[y„+i I yo A • • • A yj + s„ 

= M[y^^i] -1- s^ by independence 
= if and only if M[y,, + i] = 0. 

A special case in which the y^^ have identical distributions is the 
sequence of plays of a game of chance. A fair game is one in which the 
expected fortune M[s^^^] at any time + 1 is equal to the actual 
fortune at time n. Matching pennies is a fair game, whereas roulette 
is unfair. A game like roulette that is favorable to the house is a 
supermartingale. From the calculation above, we see that a game is 
fair if and only if the mean amount won in each round is zero. 

(2) A particle moves on a line stopping at points whose coordinates 

are integers. At each step it moves n units to the right with prob- 
ability {p^}, where n = ... , — 2, —1,0, 1,2,..., only finitely many of 
the are different from zero, ^ ^ some negative value and some 

positive value of n, and 2 = h The particle’s position after j 

steps is Xj. Set f[s) = consider the positive roots of the 

equation f(s) = 1. Since f"(s) > 0 for 5 > 0, there are at most two 
roots of the equation, and since /(I) = 1, either one or two roots exist. 
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As 5 0 or oo,f(s) 00 . Hence either 1 is a minimum point or there 

are two positive roots. If 1 is a minimum point, then /'(I) = 0, and 
hence {Xj] is a martingale, and if r is a root other than 1, then {r^/} is a 
martingale. The details of verifying these assertions are left to the 
reader. We shall need to use these results later. 

(3) Let Q be the closed unit interval [0, 1] on the real line, let the 
measure of an interval be its length, and let be the partition 
{[0, 2"^], (2'”^, 2-2"^], . . ., ((2^ — 1)2"^, 1]}. Let / be a monotone 
increasing function on [0, 1] and let f^ be a function which is constant 
on the interval (c2“^, (c -h 1)2~^] and whose value at any point in the 
interval is 

2-(/((c + 1)2--) -/(c2-)). 

Thus f^ is an approximation to the derivative /', if it exists. The 
reader should verify that (f„, ^^) is a martingale. 

3. A first martingale systems theorem 

In the first example in the preceding section, we saw that martingales 
bear some relation to gambling. A fair game is a martingale, a game 
favorable to the house is a supermartingale, and a game favorable to 
the player is a submartingale. A gambling system is a device to take 
advantage of the nature of a game of chance in order to increase the 
player’s expected fortune. Systems theorems are theorems which 
prove that certain classes of gambling systems do not work. For our 
first systems theorem, which we shall need in the proof in the next 
section, we require a lemma. 

Lemma 3-6: Let {^n} be an increasing sequence of partitions and 
let {f^} be a sequence of random variables. Suppose is a set in the 
Borel field 

If (fn, is a submartingale, then + ffc^i^- 

If (f^, is a martingale, then f^ + id/x = f^d/x. 

If (fn, ^n) is a supermartingale, then + ^ 

Proof: We shall prove only the first assertion. By Proposition 3-2, 
k = k M[f..x I ^}^]diJL, and since {f^, is a submartingale, 

we know that 

^[f/c + l I ^k] — f/c- 

The result follows immediately from integrating this inequality. 

Proposition 3-7: Let (f^, be a submartingale, and suppose that 
{€„} is a sequence of random variables such that €^(o>) = 1 or 0 for every 
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o) and such that the set {o> | €;i(a)) = 1} is a set in the Borel field 
generated by Define 

= ^0 + €o(fi — fo) + Ci(f2 — fi) + • • • + €n-l(fn ~ fn-l)* 

Then (f„, is a submartingale and 

Remark: Analogous results hold for martingales and for super- 
martingales, but we need only what is proved here. 

Proof of proposition: We first show that + i | We 

have 

I = M[f„ + €„(f„+i - f„) I 

= M[f„ I ^„] + - f„) I 

= + M[Cn(f„ + i — f„) I 

Since {co | €„(o>) = 1} is the union of cells of is constant on cells 

of Thus by Proposition 3-3, the above expression 

= - f„) I 

= 4 + ®n(]^[frt + l I ^n] — fn); 

which is 

because (f,^, is a submartingale and is non-negative. It remains 
to be shown that M[f,j — > 0; we prove the result by induction on n. 

For n = 0, fo = fo M[fo — fo] = Suppose we have proved that 
jn i^ic - ^kW > 0. Then we have = K + «te(ffc+i - 4). and 

when we subtract both sides from f^ + i, we get 

fte + l ~ ffc + 1 = ffc + 1 “ ffc ~ ^k(^k + l ~ ^k) 

= (1 ~ ^k)(^k + l ~ ^k) + (^k ~~ ffc)- 

Thus 

(ffc + i - K + iW > (1 - Cfc)(ffc+i - by hypothesis 

JQ JQ 

~ (ffc + 1 “ ^k)^H' 

{£fc(CO) = 0} 

> 0 by Lemma 3-6. 

To see the connection of Proposition 3-7 to gambling systems, we 
again consider Example 1 of Section 3-2. We take the partial sums 
as the f,^, and we observe that the differences in the 

definition of are simply the random variables The s,^ become 
modified fortunes, fortunes changed by not playing in every round of 
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the game. When = 1, the player participates in the k 1st game; 
and when = 0, he does not. The whole set of c’s, therefore, 
represents a gambling system; the condition that the set of paths for 
which «n(‘") = 1 be the union of cells of is the condition that the 
system not depend on any knowledge of the future. For the special 
case in which the process is a submartingale, the expected fortune after 
time n + 1 is greater than or equal to the fortune at time n and the 
game is favorable for the player. The content of Proposition 3-7 is 
that the player’s expected fortune after time n + \ would not have 
been increased by a system which caused him not to play in certain 
rounds. 

4. Martingale convergence and a second systems theorem 

In this section we shall prove two theorems which will be of great use 
in our treatment of Markov chains. The two results will indicate the 
value of recognizing martingales when they appear in our later work. 

Definition 3-8 : Let {fj^} be a sequence of random variables defined on 
points CO, and let r and s be numbers with r < s. An upcrossing of 
[r, s] is said to occur between n — k and n for the point co if these 
conditions are satisfied: 

(1) f„_,(o>) < r 

(2) r < < s for 0 < m < k 

(3) f„(o>) > s. 

The reader should notice that no two upcrossings overlap. 




After three preliminary results, we proceed with a proof of the 
Martingale Convergence Theorem. Proposition 3-11 is known as the 
Upcrossing Lemma. 

Lemma 3-9: If (f^, and (g,^, are submartingales, then so is 
(sup (f„, g„), 
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Proof: 

M[|sup (f„, g„)|] < M[sup (|f„|, |g„|)] 

= J + J < 00- 

Ifnl^lgnl IfnKIgnI 

The function sup (f„, g^i) is clearly constant on cells of if f^^ and 
each are. Furthermore, 

M[sup g„+i) I ^„] > M[f„+i I ^„] > f„ 

and 

M[sup (f„ + i, g„ + i) I ^„] > M[g„ + i I ^„] > g„ 

SO that 

M[sup (f„ + i, g„ + i) I > sup (f„, g„). 

Lemma 3-10: If (f„, ^„) is a martingale, M[f„] = If 

(fn. ^n) is a supermartingale, M[fJ < M[f„_i]. If (f^, is a sub- 
martingale, M[fJ > M[f,j_i]. 

Proof: The result is immediate from Lemma 3-6 when is taken 
as Q. 

Proposition 3-11: Let be a submartingale, and let P(co) be 

the number of upcrossings of [r, s] between times 0 and n. Then 

M[B1 s ”«*'■ - ’•>*1 < + M 

s — r s — r 

Proof: Consider first the special case f;^ > 0 for 0 < k < n and 
r = 0. Let be defined as in Proposition 3-7 with the e’s to be 
specified. For a given path cv, define, by induction on m, €^(oj) = 1 
whenever f^(oi) = 0, and let €„j + fc(o>) = 1 as long as + < s. As 

soon as f,„(a>) > s, require that e^(oj) = 0 until m is large enough so 
that f,„(a;) = 0 again. Then measures the increase in the sequence 
fo(oi), . . . , fn(cL>) during upcrossings (plus a possible “partial upcrossing” 
at the end) and is greater than or equal to the minimum increase in 
each upcrossing multiplied by the number of upcrossings. That is, 

5-P(oj) < 

Hence 

which by Proposition 3-7 is 



< M[fJ. 
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Therefore, 



M[P] < (l/5)M[f„] 



and the special case is proved. For a general sequence and general 
r, consider the function (f;^ ~ ^ » which is the supremum of the zero 

function and the function f;^ — r. It is readily verified that constant 
functions are martingales and that the difference of a submartingale and 
a martingale is a submartingale. Thus (f;^ ~ ^k) is a submartingale 

and by Lemma 3-9 is a submartingale. Applying the 

special case proved above to (f;^ — r)'^ and upcrossings of [0, 5 — r], we 
find 



{s - r)M[p] < M[(f, -rr] 



< M[|f„ - r|] 

< M[|f„| + |r-|] 
= M[|f„|] + H 



and the proof of the Upcrossing Lemma is complete. 



Theorem 3-12: If (f^^, is a submartingale with the property that 
M[|f„l] < K < 00 for all n, then 

lim f„(w) 

n-*oo 

exists and is finite for almost all points co. 

Proof: Failure of almost-every where convergence means that there 
exists a set of points co of positive measure for which the sequence 
diverges. At least one of two things must happen. Either for 
each fixed a; in a set of positive measure oscillates infinitely often 
above. and below rationals r(co) and s(co) with r(co) < s{co), or else 
|f^(o;)| diverges to -f-oo on a set of positive measure. We consider the 
cases separately. 

(1) Suppose lfn(^Jt>)| diverges to -f-oo on a set E of positive measure m. 
Then by Fatou’s Theorem 

r lim inf |f^(co)|{i/x < lim inf f |f^(co)|d/x 
JE n n Je 

< lim inf f |f„(co)|d/x 

n JQ 

= lim inf M[|f^|] 

n 

< K. 
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But lim inf |fn(«^)| = H-oo on jE, and E has positive measure m. Thus 
the left side of the inequality is infinite, and we have arrived at a 
contradiction. 

(2) Suppose fn(o>) oscillates infinitely often above and below rationals 
r(w) and 5(o>) on a set of positive measure m. Order the set of all pairs 
of rationals (which is a denumerable set) and call the i;th pair 
Consider the denumerable family of sets defined by = {<" I fn(“') 
oscillates infinitely often above and below the rationals of the pair 
It is possible for more than one set to have the same point in it, but, on 
the other hand, every point cu for which fni^) oscillates infinitely often 
is in some Therefore, 

2 F(^k) ^ m(U A^) = m > 0 

and there must exist a t for which /x(^e) > 0. That is, for every cu in a 
set A I of positive measure, oscillates infinitely often above and 

below fixed rationals r and s with r < s. Let be the number of 
upcrossings of [r, s] by fo(o>), . . . , fn(a>). By Proposition 3-11, 

M[p.] ^ miiw 

s — r 

. K+\r\ 
s — r 

= c for every n. 

Furthermore, the are non-negative and increasing with nio a, func- 
tion p, so that M[p] = lim M[p„] < c by the Monotone Convergence 
Theorem. But M[pj = + oo since there are infinitely many upcrossings 
on a set of positive measure. This contradiction establishes the 
Martingale Convergence Theorem. 

Corollary 3-13: Every non-negative supermartingale converges to a 
finite- valued function almost everywhere. In particular, every non- 
negative martingale converges almost everywhere. 

Proof: If (f,j, is a non-negative supermartingale, then ( — f^, ^n) 
is a submartingale. Since fn > 0, |-fn| = fn and hence M[|-~f„|] < 
M[fo] by Lemma 3-10. Therefore, { — fn} converges almost everywhere 
by Theorem 3-12, and so does {f,j}. 

We postpone a discussion of applications of Theorem 3-12 and 
Corollary 3-13 to the next section. We shall find that the corollary is 
used more frequently than the theorem itself. 
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A random time which is finite almost everywhere is called a random 
stopping time or simply a stopping time. If t is a stopping time, then 
the set I t(o>) > n] has measure zero. If {f^} is a sequence of 

random variables, we define a function f^ almost everywhere by 

f^(co) = f^(co) if t(co) = n. 

Since 

{o)lf((w) < c} = U ({<^ 1 *(«") = «} ri {o> 1 f„(w) < c}), 

n = 1 

is a random variable. 



Lemma 3-14: If (f^, is a martingale and if t is a stopping time for 
which exists, then for any n 




{t<n} {t>n} 



Proof: We have 




i f f 

fc = o , 



{t = k} 



{t>n} 



t^dfjL 



which by Lemma 3-6 



2 / 






ii = k} 



f 

it>n} 



2 / 

k = 0 V. 



{t = k} 



ffidix + 



f w 

{t>n> 



J + 

{t<n} 



f W- 

{t>n> 



Theorem 3-15: If (f„, is a martingale and if t is a stopping time, 
then M[fJ = M[fo] if 

( 1 ) M[lft|] < 00, and 

(2) lim J = 0. 

{t>n} 



Remark: Analogous results hold for submartingales and for super- 
martingales. Inequalities replace the equality in the conclusion. 
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Proof of theorem: By (1), exists, so that Lemma 3-14 

applies. Thus, for any n. 




which by Lemma 3-10 



J -f J fjjd/x 

{t<n> {t>n} 




I fn^/^ + 

{t>n} 



f (idfi,, 

{t>n} 




J + J 

{t>n} {t>n} 



Using condition (1) together with Corollary 1-17 and the complete 
additivity of the integral as a set function, we see that 




{t>n} 

Since ^ by hypothesis, we have fQdfi. 



Corollary 3-16: Suppose (f^, is a martingale defined on a space Q 
of finite total measure and t is a stopping time. If |f^| < for all 
then M[fJ = M[fo]. 



Proof: We must show the two conditions of Theorem 3-15 are 
satisfied. For (1) we have jf^] < iC by definition, and hence f^ is 
integrable. For (2) we have 



J 

{t>n} 



< J 

{t>n} 




{t>n} 



= KfJi{{(xJ I t(o)) > ?^}) 



0. 



In terms of gambling systems, the result of Lemma 3-10, namely that 
for martingales M[f^] = M[fo], states that the expected fortune at any 
fixed stopping time is equal to the initial fortune if the game is fair. 
That is, the fairness of a game is not altered by deciding in advance to 
stop playing at some fixed time. But what about a system where the 
player stops according to how the game is going ? The system he 
adopts is represented by the random time t, and Theorem 3-15 and 
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Corollary 3-16 give sufficient criteria for the game still to be fair. 
Corollary 3-16 by itself is a general result; it covers the situation, for 
example, where the game stops if either the player or the house goes 
bankrupt. If the game does stop under such circumstances, the 
corollary states that the fairness of the game is not altered by any 
gambling system whatsoever. Similar remarks apply to super- 
martingales. If the amount of money that a player has is limited, no 
system that he adopts for stopping according to how the game is going 
will make an unfair game favorable. 

The following proposition is useful in proving that certain random 
times are stopping times. 

Proposition 3-17: Let be a martingale, let t be a random 

time, and let be the stopped process with 

^n(^) ^min(n,t(Q)))(^)* 

Then (t„, ^ „) is a martingale. 

Proof: We first note that M[|f,^|] < oo since \K\ ^ 11=0 |f;| and 
each fy is integrable. Next, let be a cell in with [jl{R) / 0. In 
we have 




by definition of Since (f^, is a martingale and {t > n] is in 
the above expression by Lemma 3-6 is 



11 

■p: 


f J + 


1 




LBn{t<n} 


Rn{i>n} 


II 

■p 


f J + 


J LdlJ- 




LRn{t<n} 


Rr\{t>n} 


1 


f 

Jr 





The last expression equals since is constant on R. 

The application of Proposition 3-17 is this: Let (f^, be an integer- 
valued martingale, and let >S be a set of integers. Let the martingale 
almost surely have the property that it can be constant from some time 
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on only for values in S (and possibly for no values). We stop the 
first time it takes on a value in S, That is, we let t be the first time 
that is in S, and we introduce the stopped process in- If fhe values of 
in are bounded from below or from above, then the “stopped process,” 
is almost sure to stop. The proof proceeds as follows. 

First, assume that in > 0. Then (f^, is a non-negative martin- 
gale, by Proposition 3-17, which must converge a.e. to a finite value 
depending on o>. Since by hypothesis these values must be in S for 
a.e. CO, the process almost surely stops. Next, if {f„} is bounded 
below, apply the result for non-negative martingales to i^ plus a 
suitable constant. Finally, if ij^ < c, apply the result for {f^} bounded 
below to { — f J. 

These results are used in the next section in Examples 1 and 4. In 
Example 1 a fair game is stopped when it leaves a certain finite set, 
whereas in Example 4 it is stopped when a positive value is reached for 
the first time. By the above argument these random times are 
stopping times. 

Proposition 3-18: Let C C - - - be an increasing sequence of 
partitions and let be the smallest augmented Borel field containing 

the field [J ^n*^ f be a random variable measurable with respect 

to and having finite mean, and set = M[f | Then (g,^, 

is a martingale, and 

lim g„ = f 

n-* oo 

almost everywhere. 



Proof: We may assume that f > 0 since the general case follows by 
considering f"^ and f" separately. Then gn > 0 and M[|g,^|] = 
M[f] < 00 by conclusion (2) of Proposition 3-3. Since, in addition, 

M[gn.i l^n] = M[M[fl^,^J 

= M[f I by (1) of Proposition 3-3 

~ Sn’ 

we see that (g,^, is a non-negative martingale. By Corollary 

3-13, g = lim gn exists a.e. We shall prove that g = f a.e. 

First we prove that the g^ are uniformly integrable. Let e > 0 be 
given. Choose, by Proposition 1-47, S > 0 small enough so that 
< S implies fd/u. < e. Now 



or 



N^iign > Nj) < M[gJ = M[f] 



M(gn ^ N}) < 



M[fJ 

N ■ 
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Choose N large enough so that the right side is less than S. 
all n we have 



Since 







€, 



/ 

{g„aw) 






J fdfx. 

(gn^m 



Then for 



by Lemma 3-6, we conclude the are uniformly integrable. 
Let E be any subset of J'or m > n, 




idyi. 



By uniform integrability and Proposition 1-52, 



Therefore 




for all in U The two sides of this last equation, considered as 

set functions, are equal on (J By the uniqueness half of Theorem 

1-19 they must be equal on That is, f and g are measurable with 
respect to and satisfy 

fd/x = gdfjL, 

Je Je 



Taking E successively to be the set where f > g and the set where 
f < g and applying Corollary 1-40, we find that f = g a.e. 



5. Examples of convergent martingales 

Four examples will serve at present to illustrate Theorems 3-12 and 
3-15. Each of the first three refers to the correspondingly numbered 
example in Section 2. 

Example 1 : The sequence {s J of nth partial sums of the independent 
random variables y„ forms a martingale if M[y,^] = 0. Suppose the y,i 
have identical distributions and mean zero, suppose they assume only 
the values 0, 1, and - 1, and suppose that the process is stopped when- 
ever Syj(co) = Jfors,j(a>) = —N. (Mean zero implies that the outcomes 
+ 1 and — 1 are equally likely.) The player of the fair game is ruined 
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if s^(o>) ever equals —N and he breaks the bank if Sn(cx)) = M. Set 
p = Pr[player breaks the bank] 

and 



q = Pr [player is eventually ruined]. 

By the remarks following Proposition 3-17, p + ^=l. In this 
situation, Corollary 3-16 applies and 



or 



and 



0 = M[So] = M[sJ = p M -h q{-N) 

= pM + (1 - p)[-N) 

N 

^ ~ M + N 
M 

^ ~ M + N' 



Example 2: With the particle moving on the line, suppose there is an 
r > 0 which is a root of the equation /(5) = 1. We shall assume that 
r < 1. Then {r^n} is a non-negative martingale, and by Corollary 3-13, 
{r^n} converges to a limiting function a.e. Since is integer- valued, 
this convergence means either 

(1) for almost all cl>, there is an N such that if > N, then or 

(2) lim Xn = + 00 . 

Now Xj^ = + i = ir^ + 2 = • • • = + means that the particle fails to 

move for k consecutive steps. Since such a thing happens with 
probability p^^ < 1, the probability that x^^ = x^ for all n > N is 
zero. Thus case 1 is eliminated, and we have established that lim = 
+ 00 a.e. That is, for any k and for almost all co, Xj^{oj) = k for only 
finitely many n. 

Example 3: When the f^^’s are the difference quotients of a monotone 
function /, the pair (f^, forms a non-negative martingale. By 
Corollary 3-13, f^^ converges to a limiting function at almost all points. 
The limiting function will turn out to be the derivative off. However, 
our argument considered only nested partitions, and hence it provides 
only part of the proof of the existence of/' a.e. 

Next we consider an example where Theorem 3-15 is not applicable. 

Example 4: Suppose that a player plays the fair game of Example 1 
and that he stops the first time that he is ahead. The process is 
stopped when s^(cd) = 1. We have already seen that this is a stopping 




3-19 



Law of Large Numbers 



75 



time. Then Sq = 0 , and = 1 a.e. Hence M[Sq] = 0 ^ 1 = M[sJ. 
Why is the theorem not applicable? Condition ( 1 ) is clearly satisfied. 
However, 

0 = = J + J 

{t<n} it > 71 } 

The first term equals the probability that 1 has been reached and tends 
to 1 . Hence the second term tends to — 1 , not to 0 . Thus condition 
( 2 ) is violated. 

In practice this gambling strategy cannot be implemented, since the 
gambler would need infinite capital to be able to absorb arbitrarily 
large losses. 

6. Law of Large Numbers 

The Strong Law of Large Numbers, which may be derived from the 
Martingale Convergence Theorem, is formulated as follows. 

Theorem 3-19: Let {y,^} be a sequence of independent identically 
distributed random variables with finite mean a = M[y;^]. If = 
yi + • • * + yn and s* = sjn, then 

Pr[lim s^* = a] = 1 . 

n 

We shall prove the theorem for the special case where the random 
variables have finite range; say, Pr[y;^ = j] = pj for a finite number of 
^*’s. For more generally applicable proofs the reader is referred to the 
bibliography. (See Feller [1957], pp. 244-245, for an elementary proof 
in the case y^^ is denumerable-valued; and see Doob [1953], pp. 334-342, 
for a general proof using the Martingale Convergence Theorem.) 

We introduce a useful tool, the generating function 

9(i) = XPri'- 

j 

It is a well-behaved function. satisfying 9 ( 1 ) = 1 and 9 '( 1 ) = a. Let 

for some ^ > 0 to be specified. We shall show that n)] is a mar- 
tingale. The conditional mean of/(s„ + i, n + \) given = fc is 
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Since M[/(Sq, 0)] = 1 < oo, {/(s„, n)) is a martingale, and it is clearly 
non-negative. Thus, by the Martingale Convergence Theorem, 
/(s„, n) converges to a finite limit a.e., where 

/(s„, n) = <»»/,>(«)» = 

Fix € > 0, let 6 = a -h €, and form the function g(t) = t^l(p(t). Since 
gr(l) = landgr'(l) = b — a > 0, we have > 1 for some sufficiently 
small ^0 greater than 1. If s^*(co) > 6, we have 

[V"‘'“V«P(<o)] ^ 9{to) > 1, 

and hence if s„*(o>) > b for infinitely many n, then /(s,j(aj), n) has a 
subsequence tending to -hoo. By the convergence of /, we conclude 
that 

lim sup Sn*(co) < 6 = a -h e 

a.e. Similarly, by choosing a suitable ^ < 1 we would find that 

lim inf s„*(a>) > a — e 

a.e. Since e is arbitrary, s,^* converges to a with probability one. 

7. Problems 

1. Show that if {f,^} are denumerable -valued independent random variables, 
then 

1 fo A • • • A f„] = 

Show also that 

Pr[fo gAq a fi 6 A • • • A fn e = Pr[fo e ^o] Pr[f,^ £ A^]. 

2. Let {f,^} be a sequence of denumerable- valued independent random 
variables and let {grj be a sequence of Borel-measurable functions 
defined on the real line. Show that {9^n(fn)} is ^ sequence of independent 
random variables. If the f,^ are identically distributed and all the 
are equal to g, show that the ^(fn)’s are identically distributed. 

3. Verify that Examples 2 and 3 of Section 2 are martingales. 

4. Prove that if (f,^, and (g,i, are martingales, then so is (f,i -f g,^, 0it^). 

Does the same hold for (fng,;) • 

5. Prove that if (f^, (gn>«^n) non-negative supermartingales, 

then so is (min (f,^, g,^), 

6. Prove that if (f,^, is a martingale, then (|f,^|, is a submartingale. 
If the f,i form a martingale on their cross partitions, do the |f,^| form a 
submartingale on their own cross partitions ? 

7. Let (f,^, be a submartingale. Prove that, for any c > 0, 

c Pr j’max fj > cj < M[|f,^|]. 

[Hint: Take as stopping time the first time c is surpassed, or n, whichever 
comes first.] 
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8. Prove that every submartingale can be written as the sum of a martingale 

and an increasing submartingale. [Hint: If (x„, is given, put 
fo = 0, = M[x„ I ^„_i] - x„_i, z„ = fo + • • • + f„, and y„ = x„-z„.] 

9. Consider the following stochastic process: A white and a black ball are 
placed in an urn. One ball is drawn, and this ball is replaced by two of 
the same color. Let f„ be the fraction of white balls after n experiments. 

(a) Prove that {f,^} is a martingale. 

(b) Prove that it converges. 

(c) Prove that the limiting distribution has mean J. 

(d) Prove that the probability of ever reaching a fraction J of white 
balls is at most f. [Hint: Use Problem 7.] 

10. We consider an experiment with each outcome one of two possible out- 
comes H or T. We have two different hypotheses A and B as to how the 
underlying measure for a stochastic process should be assigned. For a 
given sequence HHT. . .H we denote by p^(HHT. . .H) the assignment 
under h}rpothesis A and by r,^(HHT . . . H) the assignment under 
hypothesis B, Let be defined by 






y„(HHT...H) 



(a) Show that if the measure is defined by hypothesis B, then {f^} is a 
martingale and hence converges a.e. 

(b) Specialize to the case of tossing a biased coin. Let hypotheses A 
and B be that the probability of heads is, respectively, p and r. 
Show that if the measure is defined by hypothesis J5, then {fn} 
converges to 0 a.e. if p ^ r. 



Problems 11 to 13 concern a type of stochastic process employed by psychol- 
ogists in learning theory. The state space consists of the rational points on 
the unit interval, and we are given two rational parameters, 0 < 6 < a < 1. 
From a point z the process moves to bx (1 — a) with probability x, or to 
bx with probability I — x. It is started at an interior point Xq. 

11. Show that if 6 = a, then {xj is a martingale. 

12. Prove that the process converges either to 0 or to 1, and compute the 
probability of going to 1 as a function of the starting position Xq. 

13. Show that if 6 < a, then {x^ is a supermartingale and the process 
converges to 0 a.e. 

Problems 14 to 18 concern the notion of conditional mean given a Borel 
field and show how it generalizes the notion of conditional mean given a 
partition. It will be necessary to know the Radon-Nikodym Theorem to 
solve Problem 14. Let {Q, ix) be a probability space in which every 
subset of a set of measure zero is measurable. 



14. If f is a random variable such that M[|f |] < oo and if ^ is a Borel field 
contained in show that there exists a function M[f | defined on Q 
satisfying 

(i) Mff I is measurable with respect to the augmented field 
obtained from 

(ii) If is a set in then fd/x = M[f | ^]d/x. 
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Show that M[f | is unique in this sense: Any two functions satisfying 
(i) and (ii) differ only on a set of fi-measure zero. We can therefore 
define any determination of M[f | to be the conditional mean of f 
given 

15. Show that if ^ is the Borel field generated by a partition then 
M[f I = M[f I a.e. Show that if ^ then M[f | ^] = f a.e. 

16. State and prove a result for these conditional means in analogy with 
Proposition 3-3. [Hint: In (3), the condition that g be constant on cells 
of ^ should be replaced with the condition that g be measurable with 
respect to ^.] 

17. Generalize the definition of martingale in Definition 3-5, using these 
conditional means . V erify that the statements and proofs of Lemma 3 - 6 , 
Proposition 3-7, Lemmas 3-9 and 3-10, Proposition 3-11, Theorem 3-12, 
and Corollary 3-13 apply with only minor changes to this generalized 
notion of martingale. Perform the same verification for Lemma 3-14, 
Theorem 3-15, Corollary 3-16, and Proposition 3-17. 

18. Prove the following generalization of Proposition 3-18: Let ^qC • 

be an increasing sequence of Borel fields in let ^ be the least Borel 
field containing [J and let f be a random variable with M[|f |] < oo. 
Then (M[f | is a martingale, and 

lim M[f I = M[f I 

n-» 00 

a.e. 
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PROPERTIES OF MARKOV CHAINS 



1. Markov chains 

During all of our discussion of Markov chains, we shall wish to 
confine ourselves to stochastic processes defined on a sequence space. 
We have shown that an arbitrary stochastic process may be considered 
as a process on a suitable Q in which the outcome functions are 
coordinate functions. We see, therefore, that in a sense no generality 
is lost by discussing Markov chains in terms of sequence space. 

Definition 4-1 : Let (Q, /x) be a sequence space with a denumerable 

stochastic process [x^ defined from to a denumerable state space S 
of more than one element. The process is called a denumerable 
Markov process if 

P^[^n + l — ^n + 1 \ ^0 — ^0 A • • • A ^n-1 ~ ^n-1 A ~ ^n] 

= Pr[^n + l = c„ + i I = c„] 
for any n and for any Cq, . . . , c^ + i such that 

Pr[xo = Co A • • • A = c^] > 0. 

The condition that defines a Markov process is known as the Markov 
property. If a denumerable Markov process has the property that for 
any m and n and for any c^, c^ + i such that Pr[x^ = c„] > 0 and 
Pr[a:„ = c„] > 0, 

PrK + i = + i I = c„] = Pr[a;„ + i = c„ + i | = c„] 

holds, then the process is called a denumerable Markov chain. The 
condition that defines a Markov chain is called the Markov chain 
property. 

All Markov chains that we shall discuss will be denumerable. From 
Proposition 2-8 we immediately have the following result. 
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Proposition 4-2: The measure on the space for a Markov chain is 
completely determined by 

(1) the starting probabilities, Pt[xq = i], and 

(2) the one-step transition probabilities, the common value of 
Pr[^n + i = i I = ^] for all n such that Pr[Xn = i] > 0. 

If 8 is the set of states for a Markov chain, we customarily denote 
representative elements of S by i, j, and 0. For any Markov 

chain we define on the set S a row vector tt and a square matrix P by 

TTi = Pr[xo = i], 

Pii = Pr[a:„ + i = j | = i\, where Pr[a;„ = i] > 0. 

The vector tt is the starting distribution, and the matrix P is the 
transition matrix for the chain. They satisfy the properties tt > 0, 
7t1 = 1, and P > 0. If Pr[xJ^ = i] = 0 for all n, then the ith row of 
P is not covered by the above definition, and we shall agree to take 
Pjy = 0 for all J in this case. 

The definition of P implies that, for each i, (P1)j = 1 or 0. It will 
be convenient, however, to think of Markov chains from a point of view 
which allows P to be any matrix with P > 0 and P1 < 1 . To do so, 
we shall admit the possibility that some of the paths in the sequence 
space are of finite length. Intuitively a path of finite length is one 
along which the Markov chain can “disappear”; the process disappears 
from state i with probability 1 — (P1)j. Mathematically paths of 
finite length can be introduced as follows: Suppose a Markov chain with 
state space S has a distinguished state 0 for which ttq = 0 and Pqq = 1 • 
We shall sometimes identify entry to state 0 with the act of disappearing 
in a process with state space 8 — {0} which also will be called a Markov 
chain. The transition matrix for the new Markov chain is the same as 
the original one except that the 0th row and column are omitted. Any 
path in the original process which has 0 as an outcome is now thought of 
as a path of finite length which terminates before the first occurrence of 
0. The original process can be recovered from the new process by 
re-introducing state 0 to the state space and by requiring that the 
transition probabilities to state 0 in the original process be the same as 
the probabilities of disappearing in the new process. With these con- 
ventions a Markov chain determines a vector tt and a matrix P with 
TT > 0, 7 t 1 = 1, P > 0, and P1 < 1. 

Conversely, if tt is a row vector defined on 8 for which tt > 0 and 
tt1 = 1 and if P is a square matrix defined on 8 for which P > 0 and 
P1 < 1 , then 77 and P define a unique Markov chain with state space 8 
by Theorem 2-4. If (P1)j < 1, then the process has positive probability 
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equal to 1 — (P1 )y of disappearing each time it is in state Whenever 
convenient, the act of disappearing may be thought of as entry to an 
ideal state adjoined to S. 

Any state i of a Markov chain P for which = 1 is said to be an 
absorbing state. If outcome i occurs at some time, the process is said 
to enter the absorbing state and to become absorbed. It is easily seen 
that once the process has been absorbed, it is impossible for it to leave 
the absorbing state. 

If P is a Markov chain with starting distribution tt and if g is a state- 
ment about the process, we denote the probability of q by Pr;j[g]. If 

_ (I when k = i 
\0 otherwise, 

we may alternatively write Prf[g]. Similarly if f is a random variable, 
we write M;i[f] or Mf[f], depending on the starting distribution. With 
this notational convention, we are free to discuss a whole class of 
Markov chains at once. The class contains all chains whose transition 
matrices are some fixed matrix P, and two chains of the class differ 
only in their starting distributions. Most of our treatment of Markov 
chains will be on this more natural level, where a matrix P, but no 
distribution tt, is specified. 

We conclude this section with a simple but useful proposition. Its 
proof is left to the reader. 

Proposition 4-3: If P is a Markov chain, then for n > 0, 

= j] = 

and 

= j] = 

We shall use the notation P\f for (P^)ij, the /^-step probability from 
i to j. 

2. Examples of Markov chains 

We give ten examples of Markov chains; we shall refer to all of them 
from time to time. 

Example 1: Weather in the Land of Oz. 

The Land of Oz is blessed by many things, but good weather is not 
one of them. They never have two nice days in a row. If they have a 
nice day, they are just as likely to have snow as rain the next day. 
If they have snow (or rain), they have an even chance of having the 
same the next day. If there is a change from snow or rain, only half of 
the time is this a change to a nice day. 
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The weather is conveniently represented as a Markov chain with the 
three states S = {Rain, Nice, Snow}. The transition matrix becomes 

R N S 

^ i 

P = N U 0 i . 

^ I \J 

Example 2: Chain with a set of states E made absorbing. 

Let P be an arbitrary Markov chain with S the set of states. Let a 
subset E oiShQ specified. We modify the original process by requiring 
that if the process is ever in a state j of E, it does not leave that state. 
The new process is also a Markov chain; its transition matrix P' differs 
from the P-matrix in that P'y = 1 and P]^ = 0 for every j e E and for 
every i / j. The new process is called the chain with E made 
absorbing. 

Example 3: Finite drunkard’s walk. 

A drunkard walks randomly on a street between his house and a 
lake, starting at a bar in the middle. He has some idea of which way 
is home. The steps along the way are labeled by the integers from 0 
to n\ the bar, some integer i between 0 and n, is the starting state, and 
the drunkard moves one step toward home (state n) with probability p 
and one step toward the lake (state 0) with probability q = I — p. 



States 0 and n are absorbing. 


We assume 


that p 


/ 0 and p ^ 1 


The transition matrix is 


0 


1 


2 


3 


77—1 


n 


0 


/I 


0 


0 


0 ... 


0 




‘ / 




0 


V 


0 ... 


0 


A 


11 


0 


<1 


0 


p ... 


0 


0 


72. — 1 ^ 




0 


0 


0 ... 


0 


pj 


n 


\o 


0 


0 


0 ... 


0 


1/ 



The reader should verify that if ^ then {x^^} is a martingale and 
that if / J, then {(qlp)^^} is a martingale. 



Example 4: Infinite drunkard’s walk. 

For this process, which is an extension of the one in Example 3, the 
states are the non-negative integers, and state 0 is absorbing. For 
each ^ > 0, we have 

Pu + I = Pi.i -1 = ?; and p + q = \. 
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We assume p 9 ^ 0 and q 0 . The transition matrix is 

0 12 3 

0 /I 0 0 0 ... 

P = 1 / ^ 0 ^ 0 ... 

2 ^0 q ^ p 

li p ^ then {x^^} is a martingale, and if ^ then {{qlp)^^} is a 
martingale. 

Example 5: Basic example. 

A sequence of tasks is to be performed in a certain order, each with 
its own probability of success. Success means that the process goes 
to the next state ; failure means that the process must start over at state 
0. Thus the states are the non -negative integers, and with each 
positive integer i we associate two probabilities p^ and q^ such that 
JPi + 3^1 = The value Pi is the transition probability from state 
^ — 1 to state i, and is the transition probability from state i — 1 to 
0. Thus Pi is the probability of succeeding in the ^th task. We 
assume that p^ < I for infinitely many i, and we normally assume that 
Pi > 0 for every The transition matrix is 





0 


1 


2 


3 


0 


/^i 


Pi 


0 


0 


p = 1 


1 ?2 


0 


P 2 


0 ... 


2 




0 


0 


P3 ■■■ j 



In connection with this example, we define a row vector j8 by 

)8o = 1 

A = n Pk- 

k = l 

Then is the probability of i successes in a row after the process starts 
at 0. The reader should verify that a necessary and sufficient condition 
for jS = jSP is that lim^.^^ = 0. This Markov chain will be referred 

to hereafter as “the basic example.” 

Example 6: Sums of independent random variables. 

The states of a Markov chain P are the elements of an index set I on 
which an operation of addition is defined in such a way that I becomes 
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an abelian group. A probability distribution {p^ defined on I satisfies 
Pi > ^ and = 1* The Markov chain P is defined by p^^ = Pj_i. 

The name of this Markov chain is derived from thinking of per- 
forming independent experiments which have probability p^ of outcome 
i. The states of the chain are the partial sums of these results, and 
the sum changes from i to j with probability Pi.i + fc Pi^ii j = i -{■ k. 

For the case in which the index set I is the set of integers with the 
usual concept of addition defined on them, martingales arise as in 
Example 1 of Section 3-2. We shall apply these ideas in Chapter 5. 

Example 7 : Two classes of random walks. 

We shall be concerned especially with two kinds of random walks. 
The symmetric random walk in 7i-dimensions is defined to be a sums of 
independent random variables process on the lattice of integer points 
in n-dimensional Euclidean space. The transition probability from 
one lattice point to another is {2n)~^ if the two points are a Euclidean 
distance of one unit apart; the transition probability is zero otherwise. 
Thus, from each point the process moves to one of 2n neighboring 
points with probability 

A second kind of random walk with which we shall be concerned is a 
sums of independent random variables process on the integers with 
Pi,i + i = V = 9. for every i. We shall call this process the 

p-q random walk. If p = g = i, then is a martingale, and if 
^ # I, then {{qlpY^] is a martingale. 

Example 8: General random walks on the line. 

The state space for a random walk on the line is the set of integers, 
and for each integer i, three probabilities and with Pi + qt -\- 

= 1 are specified. A Markov chain is defined by 

Pu + l = Pi 

Pi.i-1 — 

Pi.i = n- 

The drunkard’s walk and the p-q random walk are both special cases. 

An important case of random walks on the line which we have not 
discussed yet is the reflecting random walk. For this chain the process 
is started at a state which is a non-negative integer, and the assumption 
is made that go = A- The process never reaches the negative integers, 
and the state space may just as well be taken as {0, 1, 2, . . .}. 

Example 9: Branching process. 

The state space is the set of non-negative integers, and a flxed 
probability distribution p = [pq, p^, p 2 , . . .} is specifled. Suppose the 
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mean 2 kpj^ of ^ is m. Let {y^} be a sequence of non-negative integer- 
valued independent random variables with common distribution p, and 
set -I- • • • + y„. Let = Pr[s^ = j]. Then the branching 

process is defined to be a Markov chain with transition probabilities 
Pii = 

The usual model is the following. A species of bacteria has the 
distribution p representative of the number of offspring one such 
bacterium has before it dies. The value of represents the number of 
offspring the A:th bacterium has while it is alive, and represents the 
total number of bacteria produced by n bacteria in one generation of 
the colony. The rth position, of the stochastic process is the 
number of bacteria in the rth offspring generation. 

As we have noted, the branching process is a Markov chain. Let 
{Xj^} be the outcome functions for the chain started in state 1 (that is, 
with one bacterium in the colony initially), and suppose the mean m is 
finite. Then {xjm^} with its natural partition forms a martingale. 
The reader should verify that M[|:r„/m^|] is finite; we shall show that 
+ I A • • • A xjm'^] = xjm'^. First we note that 

M[s„] = 2 M[yi] = 

so that if we know that the process is in state r, then the mean state 
that it is in after the next step is rm. Then 






x^/m A • • • A 






1 xjm'^] by the Markov property 




1 


= + + i | x„] 




by the remarks above 


= xjm'^. 





Example 10: Tree process. 

Let be a denumerable stochastic process defined on sequence 
space, and let S be the set of states. Define a set T to be the set of all 
finite sequences of elements in S. Define a new^ stochastic process as 
follows : If t and u are elements of T for which 



and 

define 



^ ~ (^05 ^2? • • • > ^n) 

^ ~ (^ 0 > ^ 1 ’ ^25 • • • > + 



Pr[2/n + l = ^ 1 2/n = 



t\ — | ~ A • • • A — c„]. 
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The process {i/^} defined from the same space to the set T is a Markov 
chain; the entire history of the original process up to time n is con- 
tained in the knowledge of the value of the nth. outcome function for 
the new process. 

An example of a tree process is obtained by considering anindividual’s 
voting history in successive years. Letting D and R represent the 
political parties, we see that his possible histories can be conveniently 
represented as a tree: 



Start 






RDR 

RDD 



The chain is in each state — D, R, DD, DRR, etc. — at most once. 



3. Applications of martingale ideas 

Let P be the transition matrix for a Markov chain. A column vector 
/ is said to be a P-regular function, or simply a regular function, if 
/ = Pf. The function is superregular if / > P/; it is subregular if 

Pf- 

The reader should convince himself that the regularity of a function 
is a condition of the following form: At each point of the domain, the 
value of the function is equal to the average value of the function at 
neighboring points. By neighboring points we mean those states that 
it is possible for the process to reach in one step, and by average value 
we mean the average obtained by weighting the function values at 
neighboring points by the transition probabilities to those states. A 
function /is said to be regular at a point j if/y = (P/);- 

Regular measures may be defined analogously with regular functions. 
A non-negative row vector tt- is a regular measure if tt = ttP; it is 
superregular if tt- > ttP and subregular if tt- < ttP. 

Let A be a P-regular function and let h(x^) denote hj if = j. 
Suppose M[|/i(a:„)|] is finite. We shall show that {h{x^),^^) is a 
martingale, where is the cross partition 0 0 • • • 0 

determined by the outcome functions Xq, x-^, . . . , x^^ for the Markov 
chain. It is sufficient to show that 

M[/i(^n + l) I ^0 A • • • A X J = h{x^). 
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On the cell of the cross-partition where Xq = i, , . . , = j and where 

1^t[Xq = i a - • a Xn = j] > 0, 

+ \xq a • • • a x„] = M[A(x„ + i) I Xo = i A • • • A x„ = j] 

= 2 + 1 = ^: I Xo = i A • • • A X„ = 

k 

— 2 Markov chain property 

k 

= hj since h is regular 
= H^n)- 

Thus {h{Xj^), is a martingale. Similarly, superregular functions are 
associated naturally with supermartingales and subregular functions 
correspond to submartingales. The proofs differ from the above proof 
only by insertion of the appropriate inequality sign in the next to the 
last step. 

Most of our applications of martingale ideas we shall leave to the 
next few chapters. We shall, however, settle some things about 
branching processes at this time. Let {x^^} be the outcome functions 
for a branching process started in state 1, and suppose the mean 
m = 2 ^Pk is finite. As we noted in Section 2, {xjm'^} forms a non- 
negative martingale, which by Corollary 3-13 converges almost every- 
where to a finite limiting function g. One can show that g is not a 
constant function; that is, the value of the limit of very 

much depends upon the early history of the path. The exact distri- 
bution of g, however, is an unsolved problem. 

On the other hand, information about whether the process dies out 
(by being absorbed at state 0) is not hard to obtain. Let cp{s) = 
^jPjS^ and suppose r = cp{r), r > 0 , and r / 1. Then is a non- 
negative martingale. First suppose r > 1. Since is a non- 

negative martingale, it converges to a finite limit almost everywhere, 
and since r > 1, itself converges with probability one. Since XJ^ is 
integer-valued, x^ is constant on almost all paths from some point on. 
It is left as an exercise to show that the constant must be zero and that 
the process therefore dies out with probability one. Next suppose 
r < 1. Then {r^"} is bounded and converges almost everywhere to a 
limiting function , which must be 0 or 1 (that is, x^o = oo or 0) 
almost everywhere. By dominated convergence we have 

r = M[r^o] = M[r^®] = i Pr [process dies out] 

-h 0 Pr[process does not die out], 

so that r is the probability that the process dies out in the long run. 
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Finally, suppose r = 1 is the only non-negative root oi s = <p(s). 
Then m = 1 and is a non-negative martingale. Once again we 
must have with probability one, and the process is almost 

certain to die out. 

The reader should notice that the case r = 1 has the property that 
= 1 for all n, whereas M[lim x^] = 0. The process in this case is 
an example of a fair game whose final expected fortune is strictly less 
than the starting fortune. 



4. Strong Markov property 

The strong Markov property is a rigorous formulation of the following 
assertion about a Markov chain: If the present is known, then the 
probability of any statement depending on the future is independent 
of what additional information about the past is known. In this 
section we shall state and prove this result; our procedure will be first 
to prove a conceptually simpler special case and then to obtain the 
general theorem as an easy consequence. In the special case the time 
of the present will be a fixed time n, whereas in the general case the time 
of the present will be allowed to depend on the past history of the 
process. That is, the time of the present will be a random time. 
Knowledge of the present, then, means knowledge of the outcome at the 
time of the present. 

If CO = (Cq, Cl, C 2 , . . . , + . . . ) is a point in a sequence 

space, we agree to call the path 

(^n? + 1? • • • ) 

by the name co^. 



Lemma 4-4: Let {pj^} be a sequence of statements whose truth sets 
are disjoint in pairs, let \/ Pf^ be their disjunction, and suppose 
> 0 . If p is a statement for which Pr[jj | pi^] = c whenever 
Pr[Pfc] > 0. then Pr[p | V = c. 



Proof; For each k, 



Thus 



c-Pr[?>fc] = Pr[2> A p„l 
c 2 = 2 Pr[p A p^l 



Since the pj^ are disjoint statements, it follows from complete additivity 
that IPr[pfc] = Pr[V^>fc] and that 2 Pr[^ A p„] = Pr[^j a (V 
Thus c = Pr[^ A (Vi^A:)]/P^’[V Pk] the lemma follows. 
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Throughout the remainder of this section let {Q, SS, /x) be a fixed 
sequence space and {a:^} a fixed denumerable Markov chain defined on 
Q. The field of cylinder sets will be denoted by as in Section 2-1, 
and the smallest Borel field containing ^ will be called 

Definition 4-5: The tail-field is the smallest augmented Borel 
field containing all truth sets of statements A • • • A 

m > n. [Thus .^0 = ^and«^„ C <^n-i-] 

A statement relative to the field defined in Section 2-1 is one whose 
truth set depends only on outcomes Xq, . . whereas a statement 
relative to is one whose truth set does not depend on outcomes 
^^ 0 ’ • • • j - 1 * Specifically, a set Rin^i^ in if and only if, whenever 
o) e E and oj' is such that then o>' e R, 

We note that the class of sets n being the intersection of 
fields, is again a field. Moreover, is the smallest augmented Borel 
field containing n so that the uniqueness statement of Theorem 
1-19 applies: A probability measure on is completely determined by 
its values on n 

Lemma 4-6: Let {x^ be a Markov chain with starting distribution tt, 
let g be a statement relative to ^n-i^ r be a statement relative to 
+ i ^ suppose Pr;i[g A = ^] > 0. Then 

Pr^[r I g- A x„ = i] = Pr„[r \x^ = i] = Prj[r'], 

where r' is so chosen that cd e R ii and only if R' , 

Remark: Such an r' exists (and is unique), since r is a statement 
relative to + i ^ 

Proof: 

Case i .* r is of the form + 1 = i- Write g as a disjunction q = \/ 
where 

9m- = c'o’"> A • • • A 

For each m such that Pr^j[g„i A x^ = i] > 0, 

I A = i] = Pr„[x„ + i = j | A x„ = i] = P^j 

by Definition 4-1. Hence, since Pr^j[g a x^^ = i] > 0, 

Pr^[r I g A x„ = i] = Pij = Pr„[r j x„ = i] 
by Lemma 4-4. Taking r' as x^ = j, we have 

P,, = Pr,[xi = j] = Pr,[r']. 
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Case 2: r is of the form — c^+i A • • • A = c^, m > n. We 
have 

Pr,[r I g A x„ = i] 

= Pria;„+i = c„+i | ? A x„ = t] 

X Pria:„ + 2 = c„ + 2 I g A x„ = i A x„ + i = c„ + i] 

X • • • X Pr„[x„ = c„ I g A x„ = i A • • • A x„_i = c„_i]. 
The general factor on the right is 

I^^;r[^n + fc + l ~ ^n + te + 1 | ^ ^ ^ ^ * A + ~ ^n + fc]- 

First, suppose that none of these factors is zero. Then we may apply 

Case 1 with n k in place of n and q A = i A- -A + = 

Cn + k-i^^ place of q. The q's drop out of the conditions, and the product 
of the new conditional probabilities is PrJ^[r \ x^ = i]- 

Next, suppose that at least one of the factors is zero; let the first 
such factor from the left be 



Pr„[x„+fc+i = c„+fc + i I g A x„ = i A • • • A x„+fe = c„+fe]. 

We must show that Pr^[r \ x^^ = i] = 0, If A; = 0, then by Case 1 
0 = Pr;j[x„ + 1 = c„ + i I g A X„ = i] = Pr„[x„ + 1 = c„ + i | x„ = i], 
and hence Pr^[r \ x^^ = i] = 0. If > 0, then 

Pr^g A = i A • • • A + + > 0, 



and Case 1 gives 

^ = Pl’;i[^n + fc + l = ^n + k + 1 1^ ^ ~ ^ A---A X^j^^ = + 

~ P^;r[^n + fc + l ~ ^n + fc + 1 I ~ ^ A * * * A Xj^^^ ^n + tel* 

Hence Pr^[r | a;^ = ^] = 0. 

Finally r' is the statement x^ = + i A • • • A a:^-n = Sbnd, since 

Pr^[a:^ = i] > Pr^[g A a:,, = t] > 0, we have 



\^n = ^] = Pi 



•P • . . . P 

+ 1 '^n +2 ^ 



Pr,[r']. 



Case 5.* r is arbitrary in ^n + i ^ general statement r reduces 

to the denumerable union of the type statements in Case 2, and the 
result follows from the complete additivity of the probability measure. 



The lemma to follow is the strong Markov property for the case in 
which the time of the present is a fixed time n. 
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Lemma 4-7 : Let be a Markov chain with starting distribution tt, 
let g be a statement relative to let r be a statement relative to 
and suppose Pr^[g a = i] > 0. Then 

Pr„[r I g A x„ = i] = Pr„[r | x„ = i] = Prj[r'], 

where toe RH and only if <o^ e R\ 

Proof: Write 

g- = V (a^o = C<o’"^ A • • • A A x„ = 

m 

If we set q* = Vm (^o = A • • • A where the dis- 

junction is taken over just those m such that = i, then 

(q* A = i) = (q A = i) 

and 5 ^* is a statement relative to ^n-i- In the special case where r is 
relative to n we may write 

r = V (^n = A • • • A (N fixed) 

m 

and 

r* = V (^n + i “ ^n-1 ^ • • • A 

m 

with the second disjunction taken over only those m such that = i. 
Then 

Pr„[r I g A a;„ = i] = Pr„[r* \q* A x^ = i], 

I = i] = I = i]. 

and 

Pr^[r'] = Pri[r*^]. 

By Lemma 4-6, 

Pr„[r* I g* A x„ = i] = Pr„[r* | x„ = i] = Prj[r-*']. 

Hence 

Pr„[r I g A x„ = i] = Pr„[r | x„ = i] = Pr([r']. 

We have thus established the lemma for every r measurable with 
respect to n But is the smallest augmented Borel field 
containing n and by Theorem 1-19 any two measures on 
which agree on n ^ must agree on all of Thus 

Pr„[r I g A x„ = i], Pr„[r | = i], and Prj[r'], 

which define such measures as r varies, are equal for every r measurable 
with respect to 



Turning to the general case of the strong Markov property, let t be a 
random time. We define to^ point wise to be to^^ at all points where 
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t(ct>) = n. We do not define if t(o>) = oo. Similarly the outcome 
function x^. is defined to be if t(co) = n, and it is not defined for 

t(oj) = 00. 

Definition 4-8: The field is the Borel field of all sets A such that, 
for each n, A n {oj \ t(co) = n] is in The tail-field is the 

smallest augmented Borel field containing all truth sets of statements 

Xi = Ct A • - A Xt + k = * > 0 . 

A statement q relative to is one such that, for each n, the state- 
ment q A i = n depends only on outcomes Xq, . . x^^, A statement r 

relative to is one whose truth set does not depend on outcomes 
before time t. Specifically, a set in ^ is in if and only if whenever 
io e R and co' is such that ojJ = <JJ^, then w e R. 

We state the strong Markov property as the next theorem. 

Theorem 4-9: Let be a Markov chain with starting distribution 
7T, let t be a random time, let ^ be a statement relative to let r be a 
statement relative to and suppose Pr^[g a x^ = i] > 0. Then 

Pr„[r \ q h {x^ = i)] = Pr„[r | Xj = i] = Pr^r'], 
where cd g R if and only if cuf. e R', 

Proof: We shall prove the theorem for any statement r measurable 
with respect to n The theorem for general r will then follow, 
as in the proof of Lemma 4-7, from the uniqueness half of Theorem 1-19. 
Since x^ ^ i when t = oo, we have 

00 

{q A x^ ^ i) = \J {q A x^ = i A i = n). 

n = 0 

We are going to apply Lemma 4-4 with p the statement r, with p^ the 
statement qAx^ = iAi = n, and with c the constant Pr^[r']. To do 
so, we must show that 

Pr^[r I g A x„ = t A t = w] = Prj[r'] 

whenever Pr^[g r\x^ — iAi = 7i\>0, and we will have proved that 
Pr„[r \ q ^ xt=- i] = Pri[r']. 

The fact that Pr^[r \ x^. = i] equals both of these quantities will follow 
by taking g to be a tautology. 

Thus we first note that g a t = is measurable with respect to 
In addition there exists a statement f measurable with respect to 
such that 

[r A i = n) = {f A i = n)\ 
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this is so because r is the denumerable union of statements 

A • • • A = Cf + 2Vj 

and we may take f to be the same union over the statements 

A * * • A + + 

In this notation the statement r' is the union of the statements 
Xq = Cl A A X^ = + 

and we have that ca is in the truth set of f if and only if is in the 
truth set of r'. Hence 

Vr„[r \ qAx^ = iAt = n] = Pr^[r At = n\ qAXj^ = iAt = n] 

= Pr^[r At = n\ qAXn = iAt = n] 

= Pr„[r I A t = n) A a;„ = i] 

= I x„ = i] 

= Pri[r'], 

the last two equalities following from Lemma 4-7. 

An equivalent way of stating the first equality of the conclusion of 
the preceding theorem is 

Vr„[q ^ r\xt = i] = Pr„[g' \ Xt = i] Pr„[r ] x^ = i]. 

This is the form in which the theorem asserts that if the present is 
known, then the past and future are independent. 

5. Systems theorems for Markov chains 

As immediate consequences of the strong Markov property, we can 
prove two systems theorems for Markov chains. The first states that 
if p is a statement depending on outcomes only beyond some random 
time t, then one may compute Pr^[p] as if the chain were started with 
the initial distribution Pr^[a:j = j]. 

Theorem 4-10: Let {x^ be a Markov chain, and let p be a measurable 
statement with truth set P satisfying 

(1) Pr;.[p A (t = oo)] = 0, and 

(2) there exists a statement p' with truth set P' such that if t(<o) < 
+ 00 , then w 6 P if and only if ait ^ P - 

Pr^[p] = 2 Prfc[p']- 

keS 



Then 
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Proof: By (1) we have 

= 2 6 P A = *] 

k 

= 2 eP' A xt = k] 

k 

= 2 eP'\xt = k\ 

k 

= 2 = *] PrfcEj’'] Theorem 4-9. 

k 

Theorem 4-10, which is a result about probabilities of statements, can 
also be thought of as a result about means of characteristic functions. 
Then Theorem 4-11 to follow becomes a straightforward generalization 
to arbitrary functions. 

Theorem 4-11: Let {xJ^] be a Markov chain, and let f be a random 
variable satisfying 

(1) Pr^[f 0 A t = oo] = 0, and 

(2) there exists a random variable f' such that if t(ca) < co, then 
f(cu) = f'(co,). 

Then 

M„[f] = 2 = k] 

k 

Proof: If f assumes negative values, we may prove the result for f 
and f ~ separately. We therefore assume f > 0. Let be the 
statement jj2^ < f < (j + l)/2"^, for 1 < j < m-2"^, let Pq^^^ be the 
statement 0 < f < 1/2"^, and let be the statement m < f . Define 
statements similarly for f'. Then and satisfy the 

hypotheses of Theorem 4-10, so that 

= 2 = fc] 

k 

Hence 

Mif ] = lim 2 ^ 

m j = o ^ 

m2'" • 

= lim 2 Pr„[X( = *] 2 ^ 

m ;.-_0 ^ 

m2"* • 

= 2 bm 2 ] by monotone convergence 

k m ^0 2 

= 2 M.[f ']. 

k 
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6. Applications of systems theorems 

The theorems of Section 5 will play an important role in our study of 
denumerable Markov chains. At this time we shall not illustrate the 
full power of the theorems but shall be content instead to use them in 
developing some of the machinery needed for the classification of states 
in Section 7. 

We begin by introducing some notation. Define 




1 if i = j 
0 otherwise. 



Let hj be the statement about a Markov chain that state j is eventually 
reached. We have already defined the random variables and ty for 
general stochastic processes (see Section 2-6); is the number of times 
in state j, and ty is the time to reach state j. Let ff^ be the statement 

that ty(cu) = k. 

Confining ourselves to Markov chains, we associate the quantities 
Ay, iiy, ty, ff^ with hj, Hy, ty, Siiid . They are defined as follows: 

Ay: hj is true for 

ny(oi) = ny(oji) 

ly(0») = ty(0>i) + 1 

/f>: ij(oj) = L 

In terms of these quantities, we define a collection of matrices. We 
note that, in general, an expression of the form {M^[gy]} stands for a 
matrix. 

= Pr,[Ay] 

A,y = M,[ny] 

= Pr,[/f >] 

Hj = Pr,[A,] 

= M,[n,] 

Flf = Pr,[/f>]. 

It is trivial to verify that = I, that = /, that = P — P^^, 
and that N = I N. 



Proposition 4-12: If P is a Markov chain, then 

iv = 2 pfc. 

k = 0 

Proof: The result follows immediately from Propositions 2-10 and 4-3. 
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Proposition 4-13: If P is a Markov chain, then N = PN and 
N = I PN, 

Proof: The second identity follows from the first by adding I to 
both sides. To obtain the first one, we apply Theorem 4-11 with 
f = Uj and the random time identically one. Then f' = since 

nj(co) = ny(cui) by definition, and thus 

M,[n,] = 2 

k 

— 2 

k 

or _ 

N = PN, 

Proposition 4-14: If P is a Markov chain, then 

P = 2 ^ = 2 ^ 

k=0 k=l 

Proof: The first two assertions follow from the complete additivity 
of jLc; we have hj = \/ and = V/>^ disjointly. For the third 
assertion we apply Theorem 4-10 with p = hj and the random time 
identically one. Then p' is the statement hj and 

Hu = = 2 PriEa^i = *] Prk[^’'] 

k 

— 2 
k 

Proposition 4-15: If P is a Markov chain, then 

Njj = 

= I + 

Proof: The third assertion follows from the second and the identity 
N = I -{- iV with j = i. For the first assertion we apply Theorem 4-11 
with f = iij and the random time equal to t^. It is clear that ny(co) = 
if ty(oj) < 00. Therefore f' = n^, and 

Nij = Mj[n,.] = 2 
k 

= = j] M,[n,] 

= Pr([ty < co] M,[n,] 
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Similarly, for the second assertion we apply Theorem 4-11 with f = n^, 
f' = Uj, and the random time equal to ty. By the same kind of 
argument, we find 

M.[5,] = 

Proposition 4-16: Let ^ be the statement that a Markov chain reaches 
state j and then state k with j # k. Then Pri[p] = 

Proof: In the notation of Theorem 4-10, if ty is taken as the random 
time, then p' is the statement The theorem applies and 

m 

= Frlxt^ = j] Pry[Afc] 

= 

In our discussion of Markov chains, we shall make frequent use of 
the following notational devices. Let k and j be states of a Markov 
chain. By ^ny(cu) is meant the number of times on the path co that 
the process is in state j before (and not including) the first time 
that the process is in k. We define as the number of visits to j 
before the process reaches k after time 0. Notice that ^ny(co) = 0, but 
^hy(o;) is 1 if o> starts with j. For fixed k we introduce the corresponding 
matrices and by 

They are related as follows: 

= S,, + M,[%(o.i)]. 

We further define to be the probability of hitting j before k, 
having started in i; is the probability of hitting j before hitting k 
after time 0, having started in i. 

We will later want a more general notation than ^ny(o> )• By 
^ny(o>) we shall mean the number of times on the path w that the 
process is in j before it is in any state of the set E. It is sometimes 
convenient, in this connection, to think in terms of the modified chain 
in which the states of E have been made absorbing. Again we have 
matrices ^N, and we also introduce the matrices and analogously. 

If is a subset of the set of states S for which neither E nor E is 
empty, we shall decompose the P matrix into 

E E 
S \E Qj 
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according to the method discussed at the end of Section 1-1. If ^ 
is an arbitrary matrix indexed by the set S, we write for the restric- 
tion of ^ to a matrix indexed only by E, As an example, we note 
that Pe = T. 

7. Classification of states 

We introduce a partial ordering on the states /S of a Markov chain P. 
Two states i and j are said to be P-related, written R(i, j) if > 0, 
that is, if it is possible to reach j from i. If B(i,j) and P(j, i), we say 
that i and j communicate and write i ^ j. To see that P is a partial 
ordering, we note that 

(1) = 1 > 0 so that R(i, i), 

(2) If R(i,j) and R(j,k), then R{i, k) because > 0 

by Proposition 4-16. 

The reader should verify that is an equivalence relation. 

The relation ^ therefore partitions the states of S into equivalence 
classes within the ordering, and movement from state to state is within 
a class or upward through the ordering. We do not assert the existence 
of maximal classes; we shall see an example later where no maximal 
classes are present. (The reader should then be able to exhibit an 
example of a chain having no minimal classes.) 




FLOW IN A MARKOV CHAIN 

Proposition 4-17: States i and j are P-related if and only if there 
exists an > 0 for which (P%j > 0. 

Proof: Suppose > 0 is the smallest exponent for which (P^)ij > 0. 
Then = (P^)ij > 0, and since = 2n we have R(iJ). Con- 
versely, if R{i,j), then H^j > 0 and it must be true that > 0 for 
some n. Thus (P^)ij > 0. 
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Definition 4-18 : A state i is said to be recurrent if = 1 ; it is said to 
be transient if Ha < 1. 

The lemma to follow contains some identities connecting H and H 
which will be used in the next few propositions. The reader should 
study these examples of the use of the strong Markov property in order 
to develop his intuition. 

Lemma 4-19: The following statements hold: 

(1) The probability starting in i of returning to state i at least k 

times is (Use the convention 0° = 1.) 

(2) The probability starting in i of returning at least k times to i 

before hitting j is provided i ^ j. 

(3) The probability starting in i of returning to i via j is 
provided i ^ j. 

(4) The probability starting in i of reaching j for the first time after 

n returns to i is provided i / jf. 

(5) The probability starting in i of being in state j at least n times 
is 

Proof: The proofs are all by Theorem 4-10. 

(1) Use induction on k. For k = 0 the result is trivial; assume that 
it holds for k — Let p be the statement that the process returns to i 
at least k times, and let t = t^. Then p' is the statement that the 
process returns to i at least k — \ times. 

= 2 

j 

= = i] PfiLl?'] 

= 00] Prj[ij'] 

= by inductive hypothesis. 

(2) li i ^ j, the result is the same as (1) for the chain in which the 
single state j has been made absorbing (see Example 2, Section 4-2). 

(3) Let p be the statement that the process returns to i viaj, and let 
t be the time that j is reached ifj is reached before a return to i, or +oo 
if j is not reached before i. Then p' is the statement that i is reached, 
and 

Pr,M = = *]Pr/.M 

k 

= Pri [xt = j] Hji 

= Prj [t < oo] Hji 
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(4) The argument is the same as in (3). Use the systems theorem 
with t equal to the time of the return to i if the return occurs before 
j is reached, or +oo otherwise. 

(5) The proof is by induction on n and is the same as in (1) except 
that the random time becomes t = 

Proposition 4-20 : State i is transient if and only if < + oo . Then 
— 1/(1 — 

Proof: 

Nii = 2 * 

k = l 

which upon rearrangement of terms becomes 

00 00 

k= 1 m=k 

which by complete additivity is 

= 2 ^ 

k = l 

oo 

= 2 by conclusion (1) of Lemma 4-19. 

k = l 

The right side is finite if and only if < 1 . 

Corollary 4-21: If j is a transient state, then N^j < oo for all states i 
in the chain, and 

Proof: From Proposition 4-15 we have 

i\T.. = .. < iv... 

The result now follows from Proposition 4-20. 

We are now in a position to put together the ideas of recurrence and 
transience with the partial ordering R and the equivalence relation . 
We need two lemmas before we can prove our fundamental result — 
that all states in an equivalence class are of the same type, recurrent or 
transient. 

Lemma 4-22: Hi ^ j and R(i,j), then < 1 and < -foo. 

Proof: Suppose = 1. By conclusion (2) of Lemma 4-19 the 
probability of returning n times before hitting j is = 1. Hence, 
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by Proposition 2-6, there is probability one for returning infinitely often 
before hitting in contradiction to the relation R{i, j). For the second 
half of the lemma, we have, as in Proposition 4-20, 

= 2 

n = 0 

Lemma 4-23: If i is recurrent and R{i,j), then = 1 and = 1. 

Proof: The result is obvious if i = j. If i ^ j, consider returning to 
i with and without first reaching J. By conclusion (3) of Lemma 4-19, 

1 = Hu = -f 

Since -f < 1, this equation is a contradiction unless = 1 
or Hji = 1. The first alternative is ruled out by Lemma 4-22. Thus 
Hji = 1 and = I — Next, since i is recurrent, one may 

compute by summing the probabilities of reaching j for the first 
time after n returns to i, where n = 0, 1 , 2 ,.... By conclusion (4) of 
Lemma 4-19, 

= 2 = (1 - 2 = 1 - 

n = 0 n = 0 

The last equality holds, since < 1 by Lemma 4-22. 

Proposition 4-24: All states in an equivalence class are of the same 
type, recurrent or transient. 

Proof: It is sufficient to show that if one state in an equivalence 
class is recurrent, so are all others. Let i be a recurrent state, and 
suppose j ^ i, j i. Then Hjj > since the probability of 

returning is at least as great as the probability of returning via i. 
(We have used an argument familiar from Proposition 4-16 to compute 
the latter.) Hence Hjj = 1 by Lemma 4-23. 

Corollary 4-25: If i is recurrent and i ~ j, then H^j — Hj^ = 1. 

Proof: The corollary follows from Lemma 4-23. 

Because of Proposition 4-24 we are free to speak of transient and 
recurrent classes of states. We shall mention a few simple results 
about classes of states. By a closed class we mean one that it is 
impossible to leave. A process cannot disappear when it is in a closed 
class. 




102 



Properties of Markov chains 



Proposition 4-26: Recurrent classes are closed and maximal with 
respect to the partial ordering R. 

Proof: It is sufficient to prove that a recurrent class S' is closed, 
since closed classes are clearly maximal. Suppose the class can be left, 
say from a state j e S', If is a state outside S' for which > 0, 
then it is not true that R(k,j) because j and k do not communicate. 
Thus Hjj < 1 — PjTc < and J is not recurrent. 

Proposition 4-27 : If a Markov chain is started in a recurrent class S', 
then the chain is in every state of S' infinitely often with probability 
one. In particular, if i and j are in S', then Nij = -f-oo. 

Proof: Suppose the chain is started in state i. Then, by conclusion 
(5) of Lemma 4-19, the probability of being in state J at least n times is 
= 1. By Proposition 2-6 the chain is in states’ infinitely 
often with probability one. Again by Proposition 2-6 it is in every 
state infinitely often with probability one. 

Proposition 4-28: A Markov chain is in a finite subset of transient 
states only finitely often, with probability one. 

Proof: If the chain were in a finite set S' infinitely often with positive 
probability, it would be in one state j of S' infinitely often with positive 
probability. Such an occurrence would imply that is infinite for 
some i, in contradiction to Corollary 4-21 if j is transient. 

We single out two kinds of Markov chains for special attention. We 
note that every absorbing state forms a one-element recurrent class, and 
conversely. 

Definition 4-29 : A Markov chain is said to be a recurrent chain if its 
states comprise a single equivalence class and if that class is recurrent. 
A chain is called a transient chain if all of its recurrent states are 
absorbing. 

If P is an arbitrary Markov chain with r recurrent classes, then all 
properties of P can be deduced from the properties of one transient and 
r recurrent chains. This assertion follows from the observations: 

(1) If the process P starts in a recurrent state j, movement from state 
to state is confined to the single equivalence class to which j belongs. 
The properties of the chain started in j are the properties of a chain 
while it is in one recurrent class; they are thus the properties of a 
recurrent chain. 
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(2) If the process P starts in a transient state, its behavior while in 
transient states is the same as the behavior of the transient chain P' 
obtained from P by making all recurrent states absorbing. If P enters 
a recurrent state, then P' becomes absorbed. And after P has entered 
a recurrent state, its properties are those of a recurrent chain. Thus the 
properties of P may be studied by considering the one transient chain 
P' and the r separate recurrent chains. 

Because of these observations, we shall restrict our discussion in 
subsequent chapters to Markov chains which are either transient or 
recurrent. 

The reader should notice that every chain whose states form only one 
equivalence class is either a transient chain or a recurrent chain. 
Shortly we shall examine the basic example, in which all pairs of states 
communicate, to determine when it is transient and when it is recurrent. 

First we discuss some properties of maximal classes for a moment. 
Not every chain has maximal classes; a tree process, for example, 
consists of infinitely many transient classes of one state each. None 
of the classes is maximal. Even if a chain does have a maximal class, 
that class does not have to be closed. The process may have a positive 
probability of disappearing from some state in the maximal class. 

Nor is it true that all closed classes are recurrent. An additional 
condition is needed. 



Proposition 4-30 : All closed equivalence classes consisting of finitely 
many states are recurrent. 



Proof: Let the states be the first n positive integers, and suppose the 
class is transient. Then is finite for every i and j in the class. 
Therefore 



r n -| n 

= 2 n, =2 

b = i J y = i 



A,.. 



is finite. But c is the mean total number of steps taken in the class, 
and c is infinite because the class is closed. This contradiction 
establishes the proposition. 



To see that infinite closed equivalence classes need not be recurrent, 
we consider the basic example, whose states form a single equivalence 
class. Let be the probability that the chain, started in state 0, 
returns to 0 at some time up to and including time n. Then 

Poo = 



But 



1 - = P1P2 ■■■Pn-Pn, 




104 



Properties of Markov chains 



since a single step other than away from zero returns the process to 
zero at once. Now in order for the process to be recurrent it is neces- 
sary and sufficient that Hqq = 1 or that 

lim = lim (1 — = 0. 

n n 

The reader should be able to construct examples where lim^ = 0 and 
where lim^ # 0. Thus the basic example may be either transient or 
recurrent. 

8. Problems 

1. Find an expression analogous to that in Proposition 4-3 for Pri[x^ = J] 
in a Markov process. 

2. Let be the Poisson distribution with mean m on the non-negative 
integers. A game is played as follows: A random integer is selected 
with probabilities determined by pi. A second random integer rig is 
selected with probabilities determined by p ^^ . The tth random integer 
is selected with probabilities determined by Pnt _ i • Prove that with 
probability one the integer 0 is eventually selected. 

3. Show that if ^ > 0 is a column vector for which P^h converges, then the 
limit function is non -negative superregular. 

4. Let j be an absorbing state. Prove that the probability starting at i of 
ever reaching j is a regular function. 

5. Show that an independent trials process is a Markov chain in which 

is independent of i. Let 0 be any fixed state and let t be any stopping 
time. Show that + 1 = j] = Poy, and give an example to show that 

Pr^[x^ = j] does not have to equal Poy. 

6. If the symmetric random walk in 3 dimensions is started at the origin, 
the probability of being at the origin after n steps is 0 if is odd and is of 
the order of magnitude of for n even. Prove that the probability 
of returning to the origin is less than 1 . 

7. Consider the following random walk in the plane. If the process is not 
on an axis, it is equally likely to move to any of the four neighboring 
states. If it is at the origin, it stays at the origin. Otherwise, on the 
x-Sixis it takes a step away from the origin, whereas on the ^-axis it takes 
a step toward the origin. Give a complete classification of the states. 

8. Let j be a transient state in a closed class. Prove that there must be a 
state i in the class such that < I. 

9. Prove that every tree-process is a transient chain and that each equiv- 
alence class of states is a unit-set. 

10. Prove or disprove: In a chain with a minimal class and with no closed 
class, there is no non-zero non-negative regular measure. 

Problems 11 to 14 refer to a reflecting random walk, that is, a random walk 
on the non-negative integers with go = 

11. Prove that the only regular functions are constants. 
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12. Let 

pi = PoPi ■ ■ Vi = <hl2 • ■ ■ «i = cft/yc 

Show that, for any choice of the constant c, a is a regular measure. 

13. Show that all regular measures are of the form given in the previous 
problem. 

14. Show that ayPyJoci = P^j for all i and j. 

Problems 15 to 18 refer to a branching process and use the notation of 
Sections 2 and 3 of the text. 



15. Show that the roots of the equation (p{r) = r satisfy the following 
conditions: 

(a) There is an r < 1 if m > 1. 

(b) There is an r > 1 if m < 1. 

(c) r = 1 is the only root if m = 1 . 

16. Show that {r^n] is a martingale if and only if <p(r) = r. 

17. Show that is a martingale if and only if m = 1. 

18. What condition on m will assure that the branching process has positive 
probability of survival (of not dying out) ? 

Problems 19 to 24 concern space- time processes and martingales. If P is a 
Markov chain with state space 8, we define the space -time process to be a 
Markov chain whose states are pairs [i, n), where t is in aS and n is a non- 
negative integer, and which moves from {i, n) to (j, n + 1) with probability 
Pii- 

19. Prove that any space- time process is transient. What can be said about 
classification of states ? 

20. Prove that if f(i, n) is a finite-valued non-negative regular function for 
the space-time process, then /(a:„, n) is a martingale for the process P 
started at a given state 0. 

21. Specialize to the case of sums of independent random variables on the 

integers with = 0 for A: < 0. Define (p(t) — 2/c all ^ > 0 for 

which the right side is finite. Show that (p(t) is defined at least for 
0 < ^ < 1 . Fix a t for which tp(t) is defined and put 



/(». w) 



i^’ 



Prove that /(i, n) is regular for the space-time process. 

22. In Problem 21 show that/(x„, n) converges a.e. if the process is started 
at 0. 



23. Specialize further to the case where Po = Pi = 2 ’ define, for 

0 < f < 1, 



g(i, n) = 2”^*(1 — tY~K 



Show by change of variable in Problem 22 that g{Xy^, n) converges almost 
everywhere in the process started at 0. 

24. Using only the result of Problem 23, prove that if p is any number 
between 0 and 1, not equal to |, then the probability that x^ = [np] for 
infinitely many n is 0. Here [npj is the nearest integer to np. 
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1. Properties of transient chains 

Recall that a transient Markov chain is a Markov chain all of whose 
recurrent states are absorbing. Its transition matrix satisfies 
P1 < 1 . For any transient state j in the chain, we have seen that 
Hjj < 1 and N^j < +oo for every i. If E is any set of states, we can 
put the transition matrix in the canonical form 

E E 

p E IT UX 
E \R Qj 

In the special case in which P is a transient chain and E is the set of 
absorbing states, we find that T — I and = 0. (If there are no 
absorbing states, we agree to write P = Q. We shall assume that not 
all states are absorbing, however.) Thus, for a transient chain, 

U qI 

The matrices B and Q for a transient chain will always be associated 
with this standard decomposition. We observe that Q itself is the 
transition matrix for a transient chain and that this chain has only 
transient states. Some authors actually define a transient chain to be 
one with all states transient. However, in the study of these chains, 
it is often convenient to add absorbing states to ensure P1 = 1 . And 
as we saw in Chapter 4, the decomposition of general Markov chains 
into transient and recurrent chains depends on allowing absorbing 
states in transient chains. For these reasons we have adopted the 
slightly more general definition of transient chain which permits 
absorbing states. 

Let P be the transition matrix of a transient chain, and consider the 
quantity the mean number of times in state j when the process is 
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started in state i. li j is an absorbing state, then this quantity is 
infinite if j can be reached from i with positive probability and 0 other- 
wise. If i is absorbing, it is 0 unless i = j, and then it is infinite. 
Hence N^j is of interest only when i and j are transient. Thus we shall 
agree to restrict N to these entries: The matrix so restricted is called 
the fundamental matrix for the chain. We shall show that the re- 
stricted matrix is the matrix for the chain determined by Q. In 
what follows, N always denotes the restricted matrix associated with P. 

Lemma 5-1 : If P is a transient chain and if E is the set of transient 
states, then (P^)^ = Q^. 

Proof: We readily verify by induction that 

E E 

k ^ I ^ ^ \ 

~ E\{I + Q + ... + eV 

and the result follows at once. 

Proposition 5-2: 

^ = 2 

fc = 0 

Proof: For transient states i and J, we have in the P-process 

= 2 (p% = 2 

k k 

by Proposition 4-12 and Lemma 5-1. 

Proposition 5-3 : N is finite-valued, and lim^^ = 0. 

Proof: in the P-process is finite when^’ is transient; hence N is 

finite-valued. Therefore lim;^ (Q^) = 0 by Proposition 5-2. 

We recall that = M^[ny], andiV = I N. Hence iV = 

Proposition 5-4: If P is a transient chain, then 

N = QN 
N = I + QN 

N,, = 

N^, = l+ 

= 1/(1 - 
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Proof: The first two assertions are restatements of Proposition 4-13 
for the case where Q is our transition matrix. The last four results are 
a restatement of Proposition 4-15. 

Note that the conclusions of Proposition 5-4 show how to compute 
N from H and H. Our next result establishes a method of finding N 
without using the /f-matrix: For finite matrices the knowledge that N 
is a (two-sided) inverse of 7 — Q is sufficient to determine N uniquely, 
but for infinite matrices it is not. For if r is a Q-regular column vector 
and j8 is a Q-regular row vector, then -f r*j8 is a second two-sided 
inverse oi I — Q. We shall see that such regular vectors r and jS often 
exist. 

In Section 2 we shall obtain a refinement of Proposition 5-5 by prov- 
ing that N is the unique minimum non-negative inverse of 7 — Q on 
each side. 

Proposition 5-5: N(I — Q) = (I — Q)N = 7 and QN = NQ < N, 
In particular, every row of iV^ is a Q -superregular measure, and every 
column of iV^ is a Q-superregular function. 

Proof: The second and third assertions follow from the first, and 
QN = N — I hy Proposition 5-4. Also NQ = N — I hy Proposition 
5-2 and monotone convergence. Since N has finite entries, the first 
assertion follows. 

If P is a transient chain with a non-empty set E of absorbing states, 
we define the absorption matrix B to have index sets E and E and to 
have entries 



= Pr^[process is absorbed at J]. 

The P-matrix is not square; it has the same index sets as the P-matrix. 

Proposition 5-6: If P is a transient chain with a non-empty set of 
absorbing states, then B = NR. 

Proof: Let i be transient and let j be absorbing. By Theorem 4-10 
with the random time equal to the constant n and with the statement 
p taken as the assertion that the process is absorbed atj; on the -f 1st 
step, we have 

k 

= 2 

k 
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Summing on n, we find 

B = 2 

n = 0 

which by monotone convergence 

- ( z «■) « 

\n = 0 / 

= NR. 



As a result, we see from the proof of Lemma 5-1 that if P is a transient 
chain, then 



lim P^ = I 

\B 



:)• 



Let P be an arbitrary Markov chain, let P be a subset of the set of 
states, and let be the statement that the process is in states of E 
infinitely often. Define by sf = 



Proposition 5-7: For any subset E of states in a Markov chain P, 
is a P-regular function. 



Proof: Letting p be the statement s^ and taking the random time 
to be identically one, we see that p' in Theorem 4-10 is also and that 



or 



k 

= Ps^. 



For any Markov chain P we define a hitting vector and an escape 
vector by 

hf = Pr^[process eventually reaches E] 

and 

ef = Pr^[process goes on first step from P to ^ and then 
never returns to P]. 

We notice that if i e P, then hf = I, and that if j e P then ef = 0. 

The absorption matrix for the set P is defined to be a square 
matrix with index set the set of all states and with entries defined by 

Bfj = Pr^[process at some time enters P and first entry is 
at state j]. 
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We see that the JS^-matrix is computed by finding the entries of the 
^-matrix for the process with states of E made absorbing. Specifically, 
if E is the set of all absorbing states, then 

E E 

E /I 0 

= 

E \B 0 

The matrices s^, e^, and B^ are interrelated as in the following 

proposition, whose proof is left to the reader. 

Proposition 5-8: Let P be an arbitrary Markov chain. Then 

( 1 ) = B^^. 

(2) Ph^ and hence = (I — P)h^. 

(3) = 1 if and only if 1 and P1 = 1 . 

(4) If E C F, then and < 5^. 

( 5 ) s^ = B^s^, 

(6) If E C F, then B^B^ - B^. 

(1) — lim P^h^. 

2. Superregular functions 

Superregular measures and functions were defined in Section 4-3; a 
vector is P-superregular if h > Ph. Let P be a transient chain, and 
let Q be the restriction of P to transient states. As we have seen 
before, Q is a transition matrix. Our object in this section is to obtain 
a standard decomposition of non-negative Q-superregular functions 
and to use it in a c.onsideration of the solutions to the equation 
(/ — Q)x = /. Our results will hold equally well for Q-superregular 
measures, but we shall not supply the proofs. A way of transforming 
rigorously theorems and proofs about functions into theorems and proofs 
about measures will emerge later when we discuss duality. Generaliza- 
tions of the present results will arise in the study of potential theory. 

The transformation later of theorems about functions into theorems 
about measures by duality will require the existence of a positive finite- 
valued t^-su])erregular measure. Any row of N will suffice if all pairs of 
transient states commuiiieate, but if not, we proceed as follows: Number 
the states, beginning with 1, and take 

|3 = 2 

i 

where is the ^'th row of N . It is clear that /3 is superregular because 
it is the sum of non-negative superregular measures; ^ is positive 
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because > 2 Wyy >2 ^ > 0. Finally j8 is finite- 

valued because 

i8, = 2 = 2 < 2 ^ < <»• 

i i i 

The lemma and theorem to follow hold for arbitrary Markov chains. 
In the transient case they will most often be applied to the chain Q. 
The theorem has an analog in classical potential theory, but we 
postpone a discussion of this point until the end of Section 8-1 after 
we have introduced Markov chain potentials. 

Lemma 5-9: Let P be any Markov chain and let iV = 2 If 
is well defined and finite -valued, then (/ — P)(Nf) = /. 

Proof: Write f = — / “. Then and Nf~ are both finite- 

valued by hypothesis. Since PN I = N, we have PN < N and 
hence PNf'^ < Nf~^ and PNf~ < Nf~. Therefore, by Corollary 1-5, 

(7 - P){Nf) = Nf- P(Nf) = Nf- (PN)f 

= Nf-(N - l)f 
= /• 

Theorem 5-10: Let P be any Markov chain and let iV = 2 
Any non-negative P-superregular finite-valued function h has a unique 
representation h — Nf r, where r is regular. In the representation 
/ and r are both non-negative, and f = (I — P)h. 

Proof: Since h is P-superregular, 



h > Ph > P^h > • • > 0. 



Thus P^h converges to a non-negative function r. By the Dominated 
Convergence Theorem, 

Pr = P(lim P^h) = lim P^^^h = r. 

Hence r is regular. Also 

h = P” + iA + (/ 4- p + . . . + - Ph). 

Since h — Ph > 0 , we may apply monotone convergence in passing to 
the limit on n; we obtain 

Ji = r -V N{h — Ph). 

Set f = h — Ph, and existence follows. 
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For uniqueness, suppose that h = r' -\- Nf' with r regular. Then 
Nf and Nf' are finite-valued since h is. Multiplying the equation 

r + Nf = r' -h Nf' 

through by / — P and applying Lemma 5-9, we obtain 

f=r- 

Hence also r = r'. 

We return now to the special case of transient chains, where N = 
A solution g to an equation is the minimum non-negative 
solution if whenever A is a non-negative solution, we have h > g > 0. 

Proposition 5-11 : If/ > 0 and if Nf is finite, then Nf is the minimum 
non-negative solution of (/ — Q)x — /. 

Proof: By Lemma 5-9, Nf is a solution. Let x be any non-negative 
solution. Then x is finite-valued and superregular. By Theorem 
5-10, X = Nf -f- r where r > 0. Hence x > Nf. 

It follows that N is the minimum non-negative right inverse of 
(I — Q). To prove that the jth column of N is minimum, define / by 
/. = and then apply Proposition 5-11. After the analog of Prop- 
osition 5-11 for measures has been established, we find similarly that 
N is the minimum non-negative left inverse of (/ — Q). 

3. Absorbing chains 

A class of Markov chains of special interest is the class of absorbing 
chains. We shall use the material developed in the two preceding 
sections to establish the basic facts about absorbing chains. 

Definition 5-12: A Markov chain P is said to be absorbing if, for every 
starting state, the probability of ending in an absorbing state is one. 

If P is a Markov chain containing a recurrent nonabsorbing state / 
then the process cannot be absorbed if it is started in state ^. That is, 
all absorbing chains are transient chains. It is not true, however, that 
all transient chains are absorbing. The property P1 = 1 is a neces- 
sary condition. But even it is not sufficient, since the basic example 
can be transient but is never absorbing. 

The proposition to follow is the special case of the identity P^1 = 
in which E is the set of all absorbing states. 
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Proposition 5-13: If P is a transient chain, then P is absorbing if and 
only if P1 = 1 . 

The next two propositions give two ways in which absorbing chains 
arise. 

Proposition 5-14: If P is a finite transient chain such that P1 = 1, 
then P is absorbing. 

Proof: 

= (iVP)1 = iV(P1) 

= N[(I - Q)1] since (P1 + ^1)^ = (Pl)^ = 1 

= [N{I — Q)]1 by Corollary 1-6 

= 1 . 



Let a(o>) be the time on the path co of a chain P that absorption takes 
place. If the process is not absorbed along o>, define a(co) = +oo. 
Since a(a>) = co), where the sum is taken over the transient states, 

we see that a is measurable and we conclude that a is a random time. 
Define the column vector a by = iifj[a]. The vector a is indexed 
by the transient states. It is clear that the chain P is absorbing if and 
only if a is finite a.e. 

Proposition 5-15: If P is a recurrent chain and if ^P is the Markov 
chain obtained by making a non-empty set E of states absorbing, then 
^P is absorbing. 

Proof: Let j eE. Since H^j = 1 for every i, tj{oj) is finite almost 
everywhere. But a(o>) < ty(co), and ^P is thus absorbing. 



The notation ^P will be used in later sections to refer either to the 
chain P with the states made absorbing or to the chain P made so that 
it disappears instead of entering E, If 



P = 



R Qj 



then these two chains are, respectively, 




0 0 \ 
,0 Q] 



It will be clear from the context which one is meant. 
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There is one more important way in which absorbing chains arise. 
Suppose that P1 # 1. If we add an absorbing state 0, the state 
‘‘stopped,” to the state space S and define P by 

Pij = Pfy if i / 0 and j 0 
Poj — ^Oj 

Pio = 1 “ 2 if i ^ 0, 

keS 

then P, called the enlarged chain, may be absorbing. If P is a finite 
transient chain, then P necessarily will be absorbing by Proposition 5-14. 

With this set of propositions to indicate how absorbing chains arise, 
we conclude with an investigation of the properties of the vector a. 

Proposition 5-16: If P is a transient chain, then a = N^. 



Proof: 



(iV^1 = 2 summed over the transient states 

j 

= 2 ^i[n,] 

j 

= j by monotone convergence. 



But a = 2; where the sum is taken over transient states j. Thus 
(Nl), = M,[a] = a,. 



Corollary 5-17: If P is a transient chain for which P1 = 1 and if a 
has only finite entries, then a: = a is the unique minimum non-negative 
solution of the equation {I — Q)x = ^. 



Proof: It is the unique minimum non-negative solution by Proposi- 
tion 5-16 and Proposition 5-11. 



4. Finite drunkard’s walk 



The finite drunkard’s walk is a Markov chain defined on the integers 
{0, 1, . . . , 7i<} with states 0 and n absorbing and with transition 
probabilities 



and 



■Pi.i+i = p 

= g= l— p for0<^<7l. 



If we set r = qjp, two cases arise. Either r = 1 and is a martingale 
or r / 1 and {r^n} is a martingale. 
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We shall use the second martingale systems theorem (Theorem 3-15) 
to compute the entries of B, H, N, and a for the case r = 1. 

To compute the entries of the B matrix when r = 1, we note that 

£ 1=1 by Proposition 5-14; therefore £fo = 1 — £jn each transient 

state i. Since {a:^} is a bounded martingale, Corollary 3-16 applies 
with the time taken as the time of absorption (which is a stopping time 
because P is absorbing if and only if a is finite a.e.). Then 

i = 0£io + 

so that 

^in = 

and 

^io = 1 - 

To find the entry of the -matrix, we make state J absorbing and 
consider the resulting process. If i < j, the modified process is the 
drunkard’s walk on the integers {0, . . . , j) with j absorbing. Hence 
= ijj. If i > j, the modified process is the drunkard’s walk on 
{j, , , ,,n]. Renumbering the states, we can consider the process as 
starting at i — j and taking place on the states {0, . . . , n — j}. Thus, 



H,, = 1 ~ 



-3 n - i 



- J - J 

To get Hu, where i is transient, we use the fact that H = PH, so that 

Ha = pHi^ii + ^Hi_n 

n 



= 1 - 



Since p — q 



2i(n — i) 

The iV^-matrix is determined as a function of H and H by 

1 



and 

We find 



1 - H,, 






(2 

- {^){n - j) for t < j 

Tv 



Nt,- = < 



- (j)(» - i) for i > j. 



n 



Finally, we have 

a, = M,[a] = {m\ = "2 ~ i)- 

; = i 
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The case r ^ 1 proceeds in the same way. By the systems theorem 

r' = 

From this equation one easily deduces that 



P _ r* — r” 

- YZTTn 



After first computing H and N, we find 



= M,[a] = 




— n 



1 — r^' 
1 _ y-n 



When r > 1, this process is sometimes known as ‘‘gambler’s ruin” 
because of the following interpretation. A gambler walks into a 
gambling house with i dollars in his pocket, and the house has n — i 
dollars to bet against him. In a given game the gambler has prob- 
ability p of winning. Since the house fixes the odds, we have p < \ 
and therefore r > 1. If the game is played repeatedly, in the above 
Markov chain represents the gambler’s cash after k games, and is 
the probability of his eventual ruin. Since r > 1, 



1 _ y-(n-i) 

= -1 _ > 1 - 



is nearly 1 when n — i (the house’s capital) is large. Thus the gambler 
is nearly sure to be ruined, no matter how rich he is. However, is 
approximately il(q — p), which is very large if i is substantial and p 
is near to Thus the gambler is likely to have a long run for his 
money. 



5. Infinite drunkard’s walk 

Extending the finite drunkard’s walk to a process P defined on all 
of the non-negative integers, we set 

Poi — ^Oi 

^i,i + i = P for 0 < i < 00 

A,i-i = q = 1 — p for 0 < i < 00 . 

Again we take r = qjp. 

Our first problem is to establish the sense in which the infinite 
drunkard’s walk P is the limiting case of the finite drunkard’s walk. 

Let and denote, respectively, the absorption matrix, the 

fundamental matrix, and the mean time to absorption vector for the 
finite drunkard’s walk on the integers {0, . . . , n). Define in connection 




5-18 



Infinite drunkard's walk 



117 



with the infinite drunkard’s walk the random variable to be the 
number of times the process is in state j up to the first time it is in state 
k. Let p be the statement that the process is not absorbed at state 0, 
and let pj^ be the statement that the process is not absorbed at state 0 
at any time up to the time it reaches state k. 

Proposition 5-18: In the infinite drunkard’s walk 

Bjo = lim "£,o> 

n 

N„ = lim 

n 

and 

= lim 

n 

Proof; We have Pr[jj] = I - and Pr[^)„] = 1 - Since 

the union of the truth sets of the is the truth set of p, we have, by 

Proposition 1-16, 

1 - 5jo = lim (1 - "Bio). 

n 



For the iV-matrix we note that 

"iVi,. = M(["ny] 

and 

ny = lim ”^Uy monotonically. 

n 

The result for N therefore follows from the Monotone Convergence 
Theorem. Since = Mf[a] = 2; fhe assertion about is also a 
consequence of monotone convergence. 

Taking the limits of some of the quantities computed for the finite 
drunkard’s walk we find that 

ri if r > 1 

\r^ if r < 1 

f— ^ if r > 1 

)q - p 

[+00 if /* < 1. 

The value of shows that the chain is absorbing when p < q] 
that is, a is finite almost everywhere when p < q. However, M^[a] is 
finite only when p < q. 



and 






a. = 
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If we have calculated ^iO and have seen that a is a stopping time 
when p < q, we may compute M^[a] directly from martingales without 
any knowledge of the iV^-matrix. Let p < q, and for 0 < 5 < 1 define 



f(k, n) 



8 ^ 

{ps + qs'^^Y 



where n represents time and k represents position. Now 
f(k,n) = s^(ps 4- qs~^)~^ < {ps -h 



which is maximized when s — V qjp. Thus 

/(*, n) < (plqY‘^ 

< 1 since p < q. 

Hence / is bounded. It is easily seen that {f{x^, n)) is a martingale: 
Since /is bounded, M[/] is finite; the reader may verify the regularity 
property by showing that 

+ 1, n 4- 1) + qJ{Xn - 1, n 4- 1) = n). 

Let a be the stopping time of Corollary 3-16. Taking i as the starting 
state, we have 

^ 1 . 

{ps -f qs~^Y ^ {p^ + 



Set Iju — ps 


qs Then 


and 


1 — V 1 — 4tpqu^ 

s = — — 

zpu 


Defining 






(p{u) = 2 Pr[a = 


we note that 


= ^nVr[2i = n'\u^~^ 


and that 


(p'{l) = 2 ^ = n] = M^[a] = 



Using the fact that Vl — ^pq — q — p to calculate we find that 
ai = il(q — p), m agreement with the result obtained by the longer 
method. 
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The present method further allows us to find the probability distri- 
bution of Pr[a = n] by expanding [(1 — Vl — ^j[>qu^)l(2pu)Y as a 
power series in We thus find that 

Pr[a = ri] = 0 iov n < i 
Pr[a = i] = 

Pr[a = i -f- 1] = 0 
Pr[a = { -f- 2] = ipq^^^ 

and so on. 

6. A zero-one law for sums of independent random variables 

Historically, the first infinite Markov chain that was studied was the 
sums of independent random variables process. We gather some of the 
results in the next few sections,, beginning with two propositions and a 
corollary of rather general applicability. 

Proposition 5-19: If P is a Markov chain for which the only bounded 
P-regular functions are constant vectors, then, for each subset of states 
E, Pr^[5£;] = 0 or 1, independently of the starting state i. 



Proof: By Proposition 5-7, is regular and it is clearly bounded; 
therefore = c^. On the other hand, by Proposition 5-8, 



so that 



c1 = cP^1 - ch^. 



Therefore, either c = 0 or = 1. In the latter case, 5^ = 1 by 
Proposition 5-8. 



As in Example 6 of Section 4-6, we let p^ = + which, by 

assumption, is independent of i. 

Proposition 5-20 : Let P be the transition matrix of a Markov chain 
obtained from sums of independent random variables. If, for each pair 
of states q and r, there is a state s such that q can be reached from s or s 
can be reached from q and such that r can be reached from 5 or 5 can be 
reached from r, then the only bounded regular functions are constant 
functions. In particular, the hypothesis is satisfied if all pairs of states 
communicate. 



Proof: Let / be non-constant regular and suppose / /^. We 
shall assume that s can be reached from both q and r; the proof in the 
other cases is completely analogous. Let q, q cLi, q a^ -f aa, . . 
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g -f «i -h • • • + r, r + &i, . . . , r -f &1 + • • • + &n be respective 

paths of positive probability leading from g to 5 and from r to 5. Then 

y*(3 + ai +••• +a„ ~ fs ~ /r + bi +••• +brt» 

SO that at least one of the equalities in the two chains 



and 



f<l fq + ai ~ /a + ai +C2 ’*’ fq + ai+-'-+am 

fr ~ fr + bi ~ fr + bi +b 2 “ ‘ ‘ ~ /r + bi + • •• + 



must be false, since otherwise fq = /^. Without loss of generality, let 
fq + -- + a,., ^ fq + -- + a^ and let a = aj,. Then Pa > 0, Let = 

fi + a ~ fv Then g is not identically 0. Further, g is regular because 



1 j 

^ 2 ^o/; + a ~ 2 

j j 

2 ^i + CLJ + cifj + a 2 

j j 

= /i + a -/f 



= 9i- 



Suppose that for all i, |/j| < c. Then |g^j| is bounded by 2c. Since 
multiplying gr by — 1 affects neither its regularity nor its boundedness, 
we may assume h = sup^ g^ is positive and finite. For any i and any 
m > 0, 



m — JL 

2 



9i^ 



= \fi.ma-fi\ ^ 2C. 



|fc = 0 I 

Choose N so that W-6/2 > 2c. Let p^^^ = P\y+na ^ (PaT > 
p = min,^<;^ and let t be a state such that g^ > 6(1 — p/2). A 
choice for t exists since 6 is finite and since p > 0. Then for n < N, 



b 



(‘ 



Thus 

Hence 



= 2 

k 

— P^Ui + na9 1 ■¥ na + 2 

ki^t + na 

^ 2>‘"’9'f + na + (1 - 



9t + na > bj2 for n < N. 

N-1 

2 9t + ka > 2c, 
k = 0 



a contradiction. 
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Corollary 5-21: If P is a sums of independent random variables 
Markov chain in which all pairs of states communicate, then for each 
subset of states E, Pr^[,S£;] = 0 or 1, independently of the starting state i. 

7. Sums of independent random variables on the line 

Let P be a sums of independent random variables Markov chain 
indexed by the integers and defined by the probability distribution 
{Pjc}. The set of integers k for which > 0 we shall call the set of 
A;- values associated with the chain. We shall assume that the greatest 
common divisor of the A:- values is one. Thus, if both positive and 
negative A:-values exist, we see (from Lemma 1-66, for example) that all 
pairs of states communicate. 

The mean m for the process is defined by m = 2^ said to 

exist if and only if the positive and negative parts of the sum are not 
both infinite. In this section we shall establish the following result. 

Proposition 5-22: If P is a Markov chain representing sums of 
independent random variables on the line, if there are finitely many 
A:-values and if they have greatest common divisor one, if ^ Pjc = 1, 
and if m = 2 then in order for the chain to be recurrent it is 
necessary and sufficient that m = 0. 

Before we come to the proof, two comments are in order. The first 
is that the proposition can be generalized to the case where there are 
infinitely many A;- values as long as the mean m still exists and the k- 
values still have greatest common divisor one. The same condition 
m = 0 is necessary and sufficient for P to be recurrent. The second 
comment is that the necessity of the condition m = 0 is an immediate 
consequence of the Strong Law of Large Numbers (Theorem 3-19) and 
that the special added assumption we used in the proof of that theorem 
translates exactly into the condition that there are only finitely many 
A;- values. Nevertheless, we give a different proof. 

For the proof we may assume that both positive and negative 
A:- values exist. Otherwise, the chain is obviously transient. Recalling 
our discussion in Example 2 of Section 3-2, we observe that if both 
positive and negative A:-values exist, then there are either two distinct 
real roots or one double real root of the equation 



/(«) = 2 

k 

If there is a root r other than < 5 = 1 , then is a non-negative martin- 
gale. And if <5 = 1 is a double root, then is a martingale. 
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But /'(I) = (2 ^Pk^^)s=^\ = 2 ^Pk = /(s) = 1 has a double 

root at 5 = 1 if and only if m = 0. In the case m # 0, converges 
a.e. and must converge to the zero function. Thus, according as 
r < 1 or r > 1, we have lim = +oo or lim = — oo. Hence, the 
process returns to each state only finitely often with probability one. 
But if the chain were recurrent, each state would be reached infinitely 
often with probability one. Therefore, if m ^ 0, the chain is transient. 

In the case in which m = 0, let — u be the smallest i;-value and let v 
be the largest i:-value. Let E be the set of states { — u,,.,, — 2, — 1} 
and let E' be the set of states {j,j + 1, . . . , j + v — 1} for some fixed 
j. Start the process in state i with 0 < i < j, and let t be the time to 
reach the set E u E'. The chain stopped at time t is absorbing by 
Proposition 5-14, and t is therefore a stopping time. Since [Xj^ is a 
bounded martingale before time t. Corollary 3-16 applies. Therefore, 
M[a:o] = and for 0 < i < j, we have 



i = M[Xo] = M[x,] = 2 Bf,k + 2 Bf^k 

keE keE' 

> -M 2 Bf^ + j 2 Bf^ 

keE keE' 

= -uhf -\-jhf. 



which, by Proposition 5-13, 



Then 

and 



= -uhf + j(l - hf). 

ij + u)hf > j - i 



> 

J + 



Letting j oo, we find that hf = I for all ^ > 0. 

Reversing the argument for i < 0 and F = {I, v}, we find 
similarly that hf = I for all i < 0. Thus, for any state i, hf^^ = 1. 
By Proposition 5-8, sf^^ = 1. Since E U F is a> finite set. Proposition 
4-28 applies, and the chain is recurrent. 



8. Examples of sums of independent random variables 

Calculations with sums of independent random variables on the line 
normally involve either martingales or difference equations. We shall 
illustrate in this section each of these methods with an example. 

Example 1: Let the defining distribution for a Markov chain P 
representing sums of independent random variables on the integers be 
{Pi = ^5 = P]- The process is obviously transient since Hjj = 0 
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and Njj- = 1 for all j. Since Nij = we will 

have determined the ^-matrix and the iV^-matrix completely by finding 
the value of for all k. We note first that = 0 if i is negative. 
For the case i > 0, {r^n} is a nonconstant martingale if r is a nonzero 
root other than one of the equation 2 = qs ps^ = 1 . Thus 

{( — is a martingale. Taking the stopping time t as the time 
when the process reaches or passes state k, we find from Corollary 3-16 
that for the process started at state 0 

i-iipf = HoA-iipr + (1 - H^^){-iipr^K 

Therefore, 

^0'^ = 1^(1 - 



It is interesting to note that 



lini H. 



Ok 



1 

I + P 



This result can also be obtained from the Renewal Theorem of Section 
1-6 if we observe that 



so that 



m = ^kpk = q + 2p=l+p 

1 _ 1 
\ + p m 



Example 2: Let jp_i = f , and ^>2 = t- Then 

and the process is transient by Proposition 5-22. Let the transition 
matrix be called P. 

If ^ is a P-regular vector at state then = {Pg)i and 

= hi-i + Is'i + i + fS'i+2- 

We shall need a characterization of such vectors in the calculation of the 
P-matrix. The difference equation we have just formed may be 
rewritten as 

+ 2 + 3^t + i — = 0. 

Its characteristic equation (see Section l-6b) is 
4F -f 3*2 - + 2 = 0, 

whose solutions are * = 1, J, and —2. Thus, 

gi — A + B[\y -f- C{ -2y, 
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In finding the entries of the jEf-matrix, we know that H^j = 
therefore, it is sufficient to consider only the entries We shall 

look at the cases i > 0 and i < 0 separately. 

Suppose i > 0. Since the only negative A:-value is — 1 , it is impos- 
sible for the process to start in state i -f 1 and reach state 0 without 
first passing through state i. Thus, for i > 0, 

^i + 1,0 — + 

with Hqq = 1. The result is a first-order difference equation whose 
solution is H^ q = c(H^ qY. Setting i = 0 shows that c = 1. Thus 
Hi Q is exponential. 

But H = PH, so that H^ q = (PH)i q for all i > 0. Thus H^ q 
satisfies the difference equation 



— 90^i-l + 99^1 + 1 + 90^1 + 2 

for ^ > 0. It therefore satisfies 



S'j + i = hi + tSfi + 2 + fs'i + s 

for i > 0. Hence H^ q — A B{\Y + C'( — 2)^ for all i > 0. Since 
Hi Q is known to be exponential, two of the coefficients A, B, and C are 
zero and the other is one. The alternatives (7=1 and A = \ are 
eliminated, respectively, by the facts that — 2 is not a probability and 
that P drifts to the right a.e. Thus, 

= (i)‘ for i > 0. 

For ^ < 0 we again use the fact that H = PH, and we find that H^ q 
is a solution of the equation 



9i- 2 — iQi-Z + 90 ^ 1-1 + tVi 



for alH < 1. Therefore, 

= A + B(IY +C(-2Y 

for all < 1. Known values for H^ q when i = 0 and i = 1 give us 
two conditions on the three unknowns A, B, and (7. The fact that 
Hi Q < 1 as -> —00 tells us that ^ = 0. We have as a result 
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From a knowledge of the ^-matrix, we can compute H hyH = PH, 
The entries of the iV'-matrix follow from 

“ ^jj) 

and 

N,, = 

9. Ladder process for sums of independent random variables 

For a sums of independent random variables Markov chain defined 
on the integers, we define a sequence of positive step times induc- 
tively as follows: So(o») is the least n such that Xq(oj) > 0, and Si(co) is 
the least n such that > x^._ If we construct a stochastic 

process by watching the old Markov chain only at the positive step 
times — that is, by calling the nth outcome in the new process the s^th 
outcome in the old process — then the strong Markov property as 
formulated in Theorem 4-9 implies that the new process is a Markov 
chain. We shall go through this implication in detail. 

Proposition 5-23: If P is a sums of independent random variables 
Markov chain defined on the integers, then the stochastic process whose 
nth outcome is the s^^th outcome in P is a Markov chain P ^ . Moreover, 

p+ _ p + 
ij 0,j-i' 

Proof: The times s^ are random times. Applying Theorem 4-9 to 
the time and the statement r = + we find that, if 

= Co A • • • A = c^] > 0, then 

+1 ~ Cn + 1 I ^So ~ ^0 A * * * A X^^ = 

= = c„+i I = cj. 

And if = c„] > 0, then 

+ = c„] = Pr^Jx^^ = c„ + i]. 

Thus the process is a Markov chain . The fact that P^t = Po,j-i 
follows from the fact that P represents sums of independent random 
variables. 

The chain P^ is called the ladder process for P. The ladder process 
moves from i to j if j is the first state greater than i that is reached in 
the original process. If the mean step m in P is positive, then the 
process reaches or passes any given positive state with probability one, 
so that the s^ are finite a.e. Hence P‘*’1 = 1, and the ladder process 
represents sums of independent random variables. 
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As an example, we shall compute explicitly the ladder process 
associated with Example 2 of the preceding section. For the given 
chain we have p_i = f, = f, and p 2 = f* The ladder process has 
two values, namely 1 and 2, and thus has a distribution }. 

Sinc^ the positive step times are finite a.e., we have = 1 or 

Pi + P2 = 1- 

To find the values of p^ and p 2 , we note that 

Pi = ^H,2 

= ^02 + Pq,-iPiP2 
= + P- 1 P 1 P 2 • 

Putting in the known values for p_^ and p 2 , we find 

pi ^ P 2 = i 

The ladder process for our Example 2 is therefore an instance of 
Example 1 in the same section. 

10. The basic example 

The basic example is a Markov chain with state space the non- 
negative integers and with transition probabilities 

= Pv i > 0 

P i- 1.0 — Qi — ^ Pi- 

We normally assume that none of the ^/s is 0. A row vector j8 is 
defined by 

iSo = 1 

= Pfii-i for i > 0; 

it is regular if and only if limi_,oo A = 0, and the process is recurrent or 
transient according as the limit is or is not 0. 

In this section we shall compute the H and N matrices for the basic 
example when it is transient, and we shall show that a transient basic 
example has no non-zero regular row vector. 

The process cannot leave the set {0, l,...,j} without hitting j. 
Hence = I if i < j. If i > j, then j can be reached only via 0, so 
that 

by Proposition 4-16. Thus we need only find HiQ. The only way the 
process can fail to reach 0 is to continue moving to the right from i. 
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Let j3, = fli” i Pi- Then 



1 - = n ^ 



and we find 



Then 



;>• 

n if i < j 



H,, = i 



1-^ ifi>i. 
Pi 



— Pj + l^j + l.j + Qj + l^Oj 



- 1 _ ^ 
i3/ 

Suppose now that the process is transient — that is, that > 0. 
Then 

1 _ 



so that 



1 - H,, jS„ 



if i < j 

Poo 



iSy /3y .. . . 

IK-ft 

If jPoo > 0, we know that j8 is not regular. Indeed, a transient basic 
example has no non-zero regular row vector. For if a is regular, then 



2“iS'i + l = “o 



and 



f’or j > 0. 

From the second condition we find by induction that «y = ocopj. Then 
the first condition yields 

^0 ~ ^O^PiQi + l ~ ~ Pi + l) ~ “ ^oo)* 

i i 

Thus «o = 0 a = 0. 

11. Problems 

1. Consider the finite Markov chain with states 



0 



1 



2 



3 



4 
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States 0 and 4 are absorbing. At each of the other states the process 
takes a step to the right with probability f , or a step to the left with 
probability Compute P, A, and B by means of Propositions 5-5 and 
5-6. 

2. If the states of a transient chain form a single closed set, show that each 
column of A is a non-constant positive superregular function. [Note: 
We shall see later that there are no such functions for recurrent chains; 
hence their existence is a necessary and sufficient condition for a closed 
set to be transient.] 

3. Prove Proposition 5-8. Prove also that h^ = Ne^ -f- s^. Interpret 
each result. 

4. In the basic example, let P = {0, I, 2}. Compute P^, e^, and s^. 

Check formulas (I), (2), and (3) in Proposition 5-8. 

5. Prove an analog of Theorem 5-10 for row vectors. Use it to show that 
if 7T > 7tP > 0 in a transient basic example, then there is a measure /x 
such that 7T = fjiN. 

6. For a transient chain let 

X, = 

where n^ is the number of times the chain is in the finite set of states E. 
Use a systems theorem to find an equation of the form 

(I - P)x = y, 

and prove that x is the minimum non-negative solution. 

7. Find the probability in the p-q random walk started at 0 of reaching +n 
before —n. [Hint: Use the results obtained for the finite drunkard’s 
walk.] If p > y, what happens to this probability as n increases ? 

8. The one-dimensional symmetric random walk is a process to which 
Corollary 5-21 applies. If E is the set of primes, is equal to 1 or is it 
equal to 0 ? 

9. Let Xq, x-^, X 2 , • . . be the outcome functions for the symmetric random 
walk on the integers started at 0. Show that there is no non- constant 
non-negative function f{n) defined on the integers such that J{xq), 
f(xi),... is a martingale. 

10. Show by direct computation that the sums of independent random 
variables process on the integers with p^ = \ and = | is recurrent. 

11. Find H and N for sums of independent random variables on the integers 
withp_i = P 2 = i- 

Problems 12 to 19 refer to sums of independent random variables on the 

integers with i and Pi = f. 

12. Find H and N. 

13. Describe the long-range behavior of the chain. 

14. Give two examples of infinite sets E to illustrate the two possibilities 
5 ^ = 1 and — 0. 

15. Find all non- negative regular functions. 
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16. Give a necessary and sufficient condition on a non-negative function / 
that Nf be finite- valued. 

17. In the previous problem, let g = Nf. Choose an / satisfying your 
condition, and show that (/ — P)g = /. 

18. Let 

^ ^ |2 + (1)^ if ^ < 0 

' \(3 + i)(iY if i>0. 

Show that h is superregular, and decompose h as in Theorem 5-10. 

19. Use the function of Problem 18 together with the Martingale Convergence 
Theorem to prove that the process is to the left of 0 only finitely often a.e. 

Problems 20 to 22 refer to the game of tennis. It will be necessary to know 

how one keeps score in tennis. A match is being played between A and B, 

and A has probability p of winning any one point. 

20. Set up a single game as a transient chain with the two absorbing states 
“A wins” and “ B wins.” [Minimize the number of states, e.g., identify 
“30-30” with “deuce.”] Compute the probability that A wins the 
game as a function of p. 

21. Suppose that A has probability p' of winning a game. What is the 
probability that he wins a set ? What of winning the match (if he is 
required to win three sets) ? 

22. What is the probability that A wins the match if p = 0.6 ? What if 
p = 0.51 ? 




CHAPTER 6 



RECURRENT CHAINS 



1. Mean ergodic theorem for Markov chains 

Recurrent chains are Markov chains such that the set of states is a 
single recurrent class. They have the properties that P1 = ^ , H = E, 
and Mf[n^] = oo. The study of recurrent chains begins with a charac- 
terization of finite-valued non-negative superregular measures and 
functions; the reader should turn back to Sections l-6c and l-6d for the 
terms referred to in what follows. 

We shall apply Proposition 1-63 and Corollary 1-64 to the sequence 
of matrices obtained as the Cesaro sums of the powers of a recurrent 
chain P. Define 



Then 



1 n-l 

L(n) = 1 y pJc 

71 ^ 
k = 0 

0 < < E for all n. 



Theorem 6-1 : If P is the matrix of a recurrent chain, then the sequence 
of powers of P is Cesaro summable to a limiting matrix L with the 
properties L > 0 and LP = L = PL = 

Proof: We shall show that every convergent subsequence converges 
to the same limit L, The proof proceeds in four steps. 

(1) Since 

= i(/ + p + ...+ pn-l), 
n 

we have 

PP(") = 1 (P + P2 + . . . + P") = P(n)p 

n 

= + - (P" - /). 

n 
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Let be a convergent subsequence; such a sequence exists by 

Proposition 1-63. Set L = lim^ Then L > 0. Since 

lim - (P" - /) = 0, 

n n 

we have lim^ PL^”v> = L = lim^ 

(2) By Proposition 1-56 (dominated convergence), we have 

lim = P lim = PL 

V V 

and thus 

PL = L. 

By Proposition 1-55 (Patou’s Theorem), we may further conclude 
(lim P'‘v))P < lim (P<".>P) = L 

V V 

and 

LP < L. 

(3) Suppose LP is not equal to L. Then for some i and (LP)^j < 
Lij. Summing the inequalities (LP)ij^ < L^^ on k, we obtain 



2 {LP\^ < 2 Afc 

k 



since strict inequality holds in the jth entry. Thus [(PP)!]^ < (P1)j. 
Since P, P, and 1 are non-negative, associativity holds and (PP)1 = 
P(P1) = P1. Therefore, [{LP)^\ = (P1)j, and we have a contradic- 
tion. Hence LP = P = PL. By induction, we readily see that 
LP^ = L = P^L for every n. Adding these results, we obtain finally 

pp(n) =. L = U^^L. 

(4) Let be a convergent subsequence with limit P. It is 

sufficient to show that P = P. From step (3) we have P = PP^%- 
for any /x, and by Fatou’s Theorem P1 < 1 and P1 < 1. Thus, by 
dominated convergence, 

P = lim (PP^^*^^) = PP. 

u 

Interchanging the roles of P and P, we find P = PP. But by Fatou’s 
Theorem PP < P and PP < P. Therefore, 

P = PP < P 

Hence P = P and P = P^. 



and P = PP < P. 



Definition 6-2: If P is a recurrent chain, P is said to be a null chain 
if P = 0. If P 7 ^ 0, P is said to be an ergodic chain and the limit 
matrix P is called A. 
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Proposition 6-3: If P is a recurrent chain, every constant function is 
regular and the only non-negative (finite-valued) superregular functions 
are constants. 

Proof: Constant functions are trivially regular since P1 = 1. Let 
A be a finite-valued non-negative superregular function, and let the 
chain be started in any fixed state i. Since M[|A(o:o)|] = ^^< 00 , 
(h(Xj^), is a non-negative supermartingale (see Section 4-3). Thus 
lim^ A(o:^) exists and is finite with probability one by Corollary 3-13. 
If h is not a constant function, then hj ^ for some j and k. Since the 
chain is in states j and k infinitely often a.e., h{x^) = hj and h{Xj^) = h^ 
for infinitely many n with probability one. Thus h(x^) diverges a.e., a 
contradiction. 



To prove the corresponding result for measures, we introduce the 
dual matrix P, defined whenever a positive finite-valued P-super- 
regular measure a exists. The entries of P are P^j = ajPjJa^. Al- 
though we shall investigate P more fully in the next section, we mention 
some of its properties here. Suppose P is recurrent. Since P^j > 0 
and since 

2 ~ ~ 2 - a ~ 

j j 

P is a transition matrix. Since all pairs of states communicate in P, 
they do in P. Now, using induction on n, we note that if 



(pn-i)_ _ _j n i and j. 



(P% = = 



Summing on n, we see that M^[nJ = -f 00 because Mj[nj] = 00 . Hence 
P is recurrent. 



Proposition 6-4: If P is a recurrent chain, all (finite-valued) non- 
negative superregular measures are regular and are uniquely deter- 
mined up to multiplication by a constant. A non-zero non-negative 
superregular measure is positive. 



Proof: We prove the second assertion first. Suppose a > aP. 
Then a > aP^ for every n. If ocj = 0 and aj > 0, find n such that 
(P^)ji > 0. We have 

«( > 2 “m(-P’')mi ^ <^i{P’')ii > 0, 
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a contradiction. Hence a > 0. For the first assertion of the prop- 
osition, let a and j8 be non-zero non-negative finite-valued super- 
regular measures. Then a and j8 are positive. Use a to form the 
recurrent chain P. Then P1 = 1 since P is recurrent. Therefore, 






Thus ^jPji a is regular. If we can show that {jSy/a^} is a 

superregular function for P, we will have shown that = ca, and the 
proof will be complete. We have 



iPu 







1 




< 






Proposition 6-5: If P is ergodic, then ^ = 1a, a1 = 1, and a is 
regular. 



Proof: We have PA = A. Thus every column of A is regular and 
must be constant by Proposition 6-3. Hence ^ = la. Since ^P = A, 
every row of A is regular and a must be regular. It therefore remains to 
be shown that a1 = 1. Now ^ so that (1a)(1a) = (1a). By 

associativity 1(a(1a)) = la so that a(1a) = a. But a(1a) = (a1 )a so 
that (a1)a = a. If a^ / 0, then from (a1 )a^ = a^ we may conclude 
a1 = 1. 



The existence of a positive regular measure for ergodic chains is thus 
an easy matter to prove. For null chains, however, the proof is harder 
since the limiting matrix L = 0 is no help. The technique we shall use 
is to watch the recurrent chain P only while it is in a subset E of the set 
of states. 

Let P be a subset of states and let P^ be the stochastic process whose 
nth outcome is the outcome of P the nth time the process P is in the 
set E. We shall see in Lemma 6-6 that P^ is a Markov chain. From its 
interpretation it is clear that P^ is recurrent if P is recurrent. More- 
over, if P C P, then {P^f = P^. 

The index set for the matrix P^ is taken to be P. Writing P as 

P E 
E IT U\ 

~ S\R qI’ 

we have the following relationship between P® and P. 
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Lemma 6-6: For an arbitrary Markov chain P, P^ is a Markov chain 
and 

P^ = P + VNR, 

where N = 2k = oQ^- 

Remark: The lemma holds even if N has infinite entries, provided we 
agree as usual that 0 • oo = 0. 

Proof: Let be the nth outcome in the P^-process. If 

Pri 2 /o = Co A • • • A Vn-i = c^-i] > 0 , 

let t be the random time of outcome n — I and apply Theorem 4-9. 
Then 

= Cn I 2/0 = Co A • • • A Vn-i = C^-i] 

= Pri 2 /n = I 2/0 = Co A • • • A 2 /n-l = (^n -1 ^ 

= Pcc„_j2/i = cj, 

and it follows that P^ is a Markov chain. Now let i and j be in E. 
Applying Theorem 4-10 with the random time identically one and with 
the statement that E is hit after time 0 first at state j, we have 

pg = 2 

k 

= 2 + 2 

keE ktE 

= P<, + 2 

k^E 

The result then follows from Proposition 5-6. 

Lemma 6-7: For an arbitrary Markov chain P, if P is a subset of 
states and jS is a finite- valued non-negative P-superregular measure, 
then Pe is P^-superregular. 

Proof: Since ^ > j8P, multiplication of the submatrices of ^ by the 
submatrices of P gives the two relations 

Pe ^ ^E^ + 

and 

+ PeQ- 

We may rewrite the second relation as 

j3,(/ -Q)> PeU > 0. 

The proof of Theorem 5-10 translates directly into a proof for row 
vectors. From it we find 

Pe = + P> 
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where y — j8^(/ — Q), N = ^ Qf, and p is non-negative and Q-regular. 
Hence 



^s>yN> {^eU)N. 



Thus + PeUNR < 



Lemma 6-8: No finite null chains exist. 



Proof: We have = 1 or 2; = 1 for every n. Since the 

limit of a finite sum is the sum of the limits, 

(Li), = 2 L„ = 2 2 = 1- 

j j n n j 

Theorem 6-9: Every recurrent chain P has a positive finite-valued 
regular measure a which is unique up to multiplication by a scalar. 
Furthermore, a\ < oo if and only if P is ergodic. 



Proof: Order the states by the positive integers, let E be the first 
n of the states, and let F be the first n Then P^ and P^ are 

ergodic chains and have regular measures and . Also (P^)^ = P^. 
Thus af is P^-regular by Lemma 6-7, and we may choose such that 
Off = by the uniqueness part of Proposition 6-4. The procedure of 
adding a single state to F may be continued by induction, and we set 
a = lim^^s (a^ 0). Now for any of these sets E we have 



or 

Thus, 



< a^T -f a^UNR = cc^P^ — 
(X^^P ^ 



IT 0\ 

a I I = ^ {^E ^ 

\o 0/ 



IT 0\ 

As E -> S, the entries of I I increase monotonically from zero to 

\0 0 / 

the entries of P. Hence, by monotone convergence, aP < a and, by 
Proposition 6-4, a is regular. Clearly a > 0, and we know that if P is 
ergodic then a1 < oo. Conversely, suppose a1 < oo. Then, by 
dominated convergence, 

aL = lim 

n 

= lim - a(7 + • • • + P’‘“^) 

71 n 

= a 
# 0 



and L ^ 0. 
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2. Duality 

The proof of Proposition 6-4 is somewhat artificial without further 
explanation. What was used was a standard method of converting 
proofs about functions into proofs about measures. We had proved 
uniqueness for non-negative finite-valued superregular functions, and 
the idea was to take advantage of this fact in the result for measures. 

The isomorphism that exists between row vectors and column 
vectors is known as duality. Not only does duality make rigorous the 
correspondence between row and column vectors, but also it provides 
easy proofs of some new results. 

Definition 6-10: Let P be an arbitrary Markov chain transition 
matrix and suppose there exists a positive finite-valued P-superregular 
measure a. The a-dual matrix of P is a matrix P defined by 

t> _ 

Let Z) be a diagonal matrix with diagonal entries Ija^. 

We note that P = DP^D'K 

We cannot define duality in general, because we are not always 
assured of the existence of a positive superregular measure. However, 
there are only two important special cases, and we know that a 
superregular measure a exists for each of them : 

(1) P is recurrent. Then there exists a unique a-dual of P. We call 
P the dual of P or the reverse chain. We shall investigate the prop- 
erties of the reverse chain in some detail in Section 8. 

(2) P has only transient states. Then, as we saw in Section 5-2, a 
positive superregular a exists. All duality statements are relative to 
such a vector a, but there is no assurance that a is unique. 

Proposition 6-11: If P is a transition matrix, then so is P. If all 
pairs of states in P communicate, then all pairs of states in P com- 
municate. 



Proof: It is clear that P > 0. For P1 < 1 we have 

V Ay = 2 — = - 2 

r r “i “iT “i 

If i and j communicate in P by the routes 



and 



mi, m2, , . 

jy 71-^y 712) • * * > 
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then they communicate in P by 

i, n ,, . . 

j, , mi, i. 

Proposition 6-12: If P is a transition matrix, then 
pn ^ D(P^fD-^ and {M,[hj]} = 

If P is either (1) recurrent or (2) transient with only transient states, 
then P is of the same type. If P is of the second type, then 

N = DN^D-K 

Proof: The proof of the first assertion is by induction on n. The 
case = 1 is Definition 6-10. Suppose that 
P^-i = 

Then 

pk ^ ppk-i ^ [DP'^D-^){D{P^-^YD-^) = D{P^YD-^. 
Associativity holds because all the matrices are non-negative. Now 

k k 

In particular, if My[ny] is infinite, then so is My[ny]. Hence, by 
Proposition 6-11, if P is recurrent, so is P. 

Definition 6-13: Let P be a transition matrix, and let a be a positive 
finite -valued superregular measure. Let Y be any square matrix, let 
jS be any row vector, and let / be any column vector all indexed by the 
set of states. Define D to be a diagonal matrix whose diagonal entries 
are l/wj. The duals of Y, /8, and /are defined by 

dual Y = DY^D-^ 
dual ^ 
dual/ = 

The dual of a number is that number. 

We see that the dual of a row vector is a column vector and that the 
dual of a column vector is a row vector. The reader should note that P 
is identical with dual P and that part of the content of Proposition 
6-12 is that M^[nJ transforms to the P chain in the same way that P^j 
does: 

{Mj[n,]} = dual 

The fundamental properties of duals are listed in the next proposition. 
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Proposition 6-14: Let X and Y be square matrices, row vectors, or 
column vectors indexed by the set of states for a Markov chain P. 
Suppose P is the a-dual of P. Then 

(1) dual dual X = X. 

(2) dual (X + T) = dual X + dual Y. 

(3) dual (cX) = c dual X. 

(4) dual (XY) = dual Y dual X. 

(5) dual I = I and dual 0 = 0. 

(6) If X > 0, then dual X > 0; and if X > 0, then dual X > 0. 

(7) If X > y, then dual X > dual Y; and if X > F, then 
dual X > dual F. 

(8) If / is a P-superregular (or subregular) column vector, then 
dual / is a P-superregular (or subregular) row vector; and if j8 
is a P-superregular (or subregular) row vector, then dual jS is a 
P-superregular (or subregular) column vector. 

(9) dual 1 = a and dual a = 1 . The measure a is P-superregular. 

(10) If lim^ X^”> = X, then lim,, dual X<^> = dual X. 

Proof: We shall prove only (1) and (4); the rest of the proof is left 
to the reader. For (1) we have 

dual dual X = dual (DX^D-^) 

= D(DX^D-^fD-^ 

= DD'^X^^DD'^ 

= X. 

Associativity holds because D and D~^ are diagonal matrices. 

For (4) we have 

dualXF = D(XYfD~^ 

= DY^X^D-^ 

= {DY^D-^)(DX^D-^) 

= dual F dual X. 

We may summarize Proposition 6-14 by saying that the operation 
dual is its own inverse, it reverses products, and it preserves sums, 
equalities, inequalities, regularity, and limits. We know, for example, 
that the dual of a recurrent chain is recurrent, and since dual is one-one, 
a dual recurrent chain is the most general recurrent chain. Hence a 
proof “for all recurrent chains P” is a proof for all recurrent chains. 

The essential feature of duality lies in this last statement; we shall 
apply it to the proof of Proposition 6-4. We start with a recurrent chain 
P and two positive superregular measures a and /3. Forming P, the 
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a-dual of P, we observe that since P is the most general recurrent chain 
and since 1 is P-regular, the dual of 1 must be P-regular. Thus a is 
regular. Now since ^ is P-superregular and non-negative, dual p is P- 
superregular and non-negative. Hence dual is a constant vector, and 
the proof that )3 is a constant multiple of a is complete. 

To form the a-dual of the restriction of a matrix, we use the appro- 
priate restrictions of D and D~^ which make the matrix products 
defined. For example, write 

E E 
^_E IT m 

S \R qI 

By definition, 

dual T = DeT^De~\ 
dual U = DeU^De-\ 
dual R = DeR^De~^, 

and 

dualQ = DgQ^ De~^. 

Note that 

f u\ 

= ^ = DP'^D-^ 

n q) 



_ IDe 0 \!T^ 0 \ 

"\0 dJ\u^ Q^I\ 0 B,-^/ 

_ IDeT^De-^ DeR-^De-^X 
~ [deU^De-^ DeQ^De-^I’ 

SO that 

dual T = P, 
dual U = R, 
dual R = tJ, 

and 

dual Q = Q. 



To make effective use of duality, it is convenient to know what 
interpretation, if any, the duals of the matrices associated with P have 
in terms of the P-process. At this time we shall calculate the duals of 
^P, P^, and 
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If we let be the process P watched until it enters E, then ^P has 
transition matrix 



and fundamental matrix 




= 



0 0 \ 

0 IQV 



Hence dual ^P — ^P and dual = ^N. 

The duals of P^ and are not so trivial to settle, and we shall 
state what they are as the next two propositions. 



Proposition 6-15; The dual of P^ is P^. 



Proof: By Lemma 6-6, 

= T + U(2Q’^)B. 

Hence 

dual P^ = dual T -h (dual P)i^ (dual (dual U) 

= f- + tJ{2Q’^)R 

= p^. 



Proposition 6-16; 



(dual = 



if ieE 
[o if i ^ E, 



where is the number of times that the a-dual process started at i 
is in j before returning to E. 



Proof: Let N = Then 

B^ = 



so that 



dual B^ = 



r !• 

\NR 0/ 
II U^!f\ 

\o 0 /■ 



Thus ifi ^ E, then (dual = 0, and if i and^' are in E, then 
(dual B% = 

If i e E and j ^ E, then the result that 

(UE% = 

follows from Theorem 4-11 with the random time identically one. 
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If we define to be a matrix indexed by E and S whose entry 
is the mean number of times starting in i that the process is in j before 
returning to E, then we may rewrite the result of Proposition 6-16 as 

I^N 

dual = 

\ 0 

The rest of this section contains applications of Propositions 6-15 
and 6-16. We begin by deriving two identities relating to other 
matrices, and we shall then dualize the first identity to obtain a result 
which will be used in Chapters 8 and 9. Finally we shall apply 
Proposition 6-16 in a different way to get a probabilistic interpretation 
for ajja^. 

Proposition 6-17: For any set E, 

1 1 - 0 \ 

= 0 « ■ 



If = 7 + P + • • • + P", then 

_ ^(n) ^ £_^|pn+l _ ^ 



(I - P)B^ = 



Proof: Set N = Q". For the first identity we have 

H - T -U \l 1 0\ / I - {T + UNR) 0' 

R I - qI\NR Of ~ \-R + (I - Q)NR 0, 

1 1 - 0 

0 0 , 

For the second identity, we have 

/O 0 



since (7 — Q)N =7 



EN = 



and hence 



and 



^NP = 



0 NJ 

0 0 
NR N - I 



0 0 

^N{P - 7) = I ) = - 7. 

\NR -II 



Therefore £iy(P" + i - 7) = [®iV(P - 7)]i'/<"> = (P® - 7)iV<"). 
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Proposition 6-18: For any set E 




(I - p) 




Proof: Apply duality to the first identity of Proposition 6-17, using 
Propositions 6-15 and 6-16. Then 

/^N\ // - 0 \ 

0 0 - 



Since this identity holds for all reverse processes P, it holds for all 
processes. 



Using Proposition 6-16, we can obtain a simple interpretation for the 
ratio aj|a^. The case in which P is recurrent is of special importance 
because a is unique up to multiplication by a constant. But first we 
prove a more general result. 

Proposition 6-19: Let a be a positive finite- valued superregular 
measure for P, and let P be the a-dual for P. Then for any set E, 

2 = a, hi 

ieE 



Proof: By Proposition 6-16, 
(dual B^)ij 



'^Nij for ieE 
0 for i ^ E. 



Therefore 



dual = dual {B^^) = a(dual B^) 



= 2 “i 

ieE 



Corollary 6-20: Let a be a positive finite-valued superregular measure 
for P, and let P be the cc-dual of P. Then 

Proof: Set E = {i} in Proposition 6-19. 

In particular, for any such a. 
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Corollary 6-21 : Let a be a positive finite-valued regular measure for 
a recurrent chain P. Then 

Proof: Since P is recurrent, so is P. Thus Sji = 1, and we may 
apply Corollary 6-20. 

Corollary 6-22: Let a be a positive finite-valued regular measure for 
a recurrent chain P. Then 

2 “i j- 

ieE 

Proof: Apply Proposition 6-19. Then fif = 1. 

Definition 6-23: Let P be a recurrent chain. Set 

if,, = 

and 

M,, = M,[t,]. 

The matrix M is called the mean first passage time matrix. Similarly 
Mie is the mean time from i until E is reached, and is the mean 
time from i to return to E. 



Proposition 6-24: If P is a recurrent chain with positive regular 
measure a, then 

2 

ieE 

Proof: We have 

2 cciM,E = 2 2 a = 2 2 

ieE ieE j j ieE 

= 2 . 

j 

the next to last equality following from Corollary 6-22. 



Proposition 6-25: If P is a recurrent chain with positive regular 
measure a, then 



n 



M 



a 



if P is ergodic and a1 = 1 



[-1-00 if P is null. 



Proof: Set E = {i} in Proposition 6-24. 
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3. Cyclicity 

Let P be a recurrent Markov chain and let i be a fixed state of the 
chain. Define a set of positive integers T by 

T = {k\ > 0, > 0}. 

Let d be the greatest common divisor of the integers in T. 

Lemma 6-26: T is non-empty and is closed under addition. 

Proof: T is clearly non-empty since P is recurrent. Suppose m and 
n are integers in T. Then {P^)u > 0 and {P^)a > 0, so that 

k 

> {Pna(P%i 

> 0 . 

Hence m + nis in T. 

Noting the discussion in Section l-6e, we arrive at the following 
result, using Lemma 1-66. 

Lemma 6-27 : T contains all sufficiently large multiples of its greatest 
common divisor d. 

The integer d we shall call the period of the chain for the state 

Proposition 6-28: The period of a recurrent chain for the state i is a 
constant independent of the state i. 

Proof: Let i and J be any two states in the chain. Since the chain 
is recurrent, i and j communicate. Let d be the period associated with 
state i and let d be the period associated with state j. Suppose the 
minimum possible time for the process to go from state i to state j is s, 
and suppose the minimum time for the process to go from j to i is t. 
By Lemma 6-27 let N be large enough so that the process can return to 
j in nd steps for all n > N. Then the process can go from i to j in s 
steps, return to j in Nd steps, and go back to ^ in ^ steps. Hence 
d \ (s Nd t). Similarly, | (5 -f (A^ -f 1)5 + 0- Thus d divides 
the difference, or d \d. Reversing the roles of i and j, we find that 
d I d. Therefore, d = d. 

We may thus speak of the period of a recurrent chain without 
ambiguity. Every recurrent chain has a period, and that period is 
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finite. If, for example, P is a recurrent chain in which P^^ > 0 for 
some i, then P has period one. 

Definition 6-29: A recurrent chain is said to be non-cyclic if its period 
is one and cyclic if its period is greater than one. 

Let P be a recurrent chain of period d. Define a relation R on the 
states of P by the following: We say that i Rjii and only if, starting at 
i, the process can reach J in md steps for some m. From the definition 
of the period rf, it follows that i R i. The symmetry of R follows from 
the fact md plus the time to return from j to i must be a multiple of d. 
To see that R is transitive, we note that if J can be reached from i in md 
steps and if k can be reached from j in nd steps, then k can be reached 
from i in (m + n)d steps. 

Thus R partitions the states into cyclic subclasses. The reader may 
verify that there are d distinct subclasses and that the nth class contains 
all those states which it is possible to reach from the starting state only 
at times which are congruent to n modulo d. The process moves 
cyclically through the classes in the specified order. Furthermore, if 
the chain is watched after every dth step, the resulting process is again 
a Markov chain (by the strong Markov property), and its behavior will 
be noncyclic. The transition matrix for the new process is P^, and its 
form is that of d separate recurrent chains : 



pcf = 



cf blocks. 



The entries in each block are the entries of a recurrent noncyclic chain, 
and the entries which are not in any block are all zeros. 

The observation that P^ is really d separate recurrent noncyclic 
chains enables us to study representatively the properties of all re- 
current chains by considering only noncyclic chains. Thus, it is to 
noncyclic chains that we now turn our attention. The main tool 
in their study will be chains representing sums of independent random 
variables. 
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4. Sums of independent random variables 

We have already investigated some of the properties of sums of 
independent random variables Markov chains. Such processes are 
especially important because of how they arise from general recurrent 
chains (see Proposition 6-32), and it is for this reason that we now 
discuss their origin. 

For concreteness we shall confine ourselves to sums of independent 
random variables chains defined on the integers. Before recalling the 
definition of independent random variables, we remark that if y is a 
real-valued function defined on a probability space and having a 
denumerable range, then a necessary and sufficient condition for y to 
be measurable (and hence to be a random variable) is that the inverse 
image under y of every one-point set be measurable. The condition is 
necessary because {co | y(oj) = c} = {o> | c < y(co) < c] must be measur- 
able, and it is sufficient because {<uj \ y(co) < c} is a countable union of 
such sets. Therefore, if y is a denumerable-valued random variable 
and if E is an arbitrary set of real numbers, the set {co | y(w) e Ej is 
measurable. 

Definition 6-30: The denumerable- valued random variables yi, y 2 , 
yg, . . . defined on Q are independent if, for every finite collection of sets 
E^y E 2 , • • y Ej^ of reals, it is true that 

m 

Pr[yn,(w) eE^for k = 1, . . . , m] = H e E„]. 

k = l 

The random variables are identically distributed if, for any m and n and 
for any set E of reals, it is true that 

Pr[y,,(o;)G^] = Fr[y,(co) E E]. 

An independent process [y^ was defined in Section 2-5 as one in 
which the statements = Cq A • • • A Vn-i = Vn — 

probabilistically independent for every > 0 and for every choice of 
the c’s. We see that an independent process is that special case of a 
Markov process in which Pr^[y,^^i = j \ is independent of i. 

Moreover, an independent process is a Markov chain if and only if it is 
an independent trials process. 

Proposition 6-31: Let {y,^} be a stochastic process defined from a 
sequence space to a denumerable set of real numbers S. The 
stochastic process is an independent process if and only if the {y,^} 
are independent random variables. It is an independent trials process 
if and only if the {y^} are independent and identically distributed. 
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Proof: We are to prove first that the {y^} are independent if and only 
if the statements yo = Cq A • • • A Yn - 1 = y^ = are 

probabilistically independent for any n > I and for any choice of the 
c’s. Independence of the y’s means 

n 

Pr[yo = Co A • • • A y„ = c„] = n Pr[yfc = cj 

fc=l 

and 

n - 1 

Pr[yo = Co A • • • A y„_i = c„_i] = H Pr[y^. = c„] 

k = l 

for all n. This statement holds if and only if 

Pr[yo = Co A • • • A y„ = c^] 

= Pr[yo = Co A • • • A Jn-i = Cn-i]JPr[y^ = cj 

for all n. Second, we are to prove that the {y,^} are also identically 
distributed if and only if 

Pr[yn = = Pr[y,„ = cj 

for any n and m. But this assertion is clear from Definition 6-30. 

Let {y^} for n > 0 be a sequence of independent random variables 
which are identically distributed for n > I and which have range in the 
union of the integers and { — oo, +oo}, and define inductively 



^n + l = yn + 1 + foc W > 0. 

If the y^ are finite-valued a.e., we claim that the random variables 
are the outcome functions for a sums of independent random variables 
process on the integers with starting distribution = Pr[yo = i]. 
Setting 

Pk = Pr[yn = *]. » > 

which is a constant not depending on any other function in the sequence 
(by independence) and not depending on n (by identical distributions), 
we see that 2 1 since y„ is finite-valued a.e. Moreover, if 

Pr[xo = a A ■ ■ ■ ^ ^n-i = ^] > 0 with n > 0, then 

Pr[x„ = j I Xo = a A Xj = 6 A ■ ■ • A x„_i = i] 

= Pr[y„ = j - i I yo = a A 2/i = 6 - a A • • • A y„_i = i - A] 

= Pr[y„ =j - i] 

= Pi-i- 
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Hence the process is a Markov chain representing sums of independent 
random variables. 

Conversely, let P be the transition matrix for a sums of independent 
random variables Markov chain with state space the integers and with 
outcome functions Let — Xj^_i fovn > 0. We shall show 

that Xq and the y^ are independent and that the y^ are identically 
distributed; it is clear that Xq and the y^ are finite-valued a.e. In fact, 
we have 

Pr^[:To = Co A (y^ = C;, for 1 < A: < n)] 

~~ '^Cq ^Cq ,Cq +Ci +’•' +Cn- 1 .Co + ••• 

= '^CqPci ■ * ' Pcn 

n 

= = Co]- n = cj, 

k = l 

and independence follows by taking countable disjoint unions of such 
statements; since 

Pr;.[yr. = J] = Pp 

the y^ are identically distributed. 

Sums of independent random variables appear in a natural way in 
the study of recurrent chains. The result to follow associates to every 
recurrent chain P a sums of independent random variables chain P* 
with state space the integers. 

Proposition 6-32 : Let P be a recurrent chain with outcome functions 
x^. For a fixed state s let t^(a>) be the (n -h l)st time on the path w 
that state s is reached. Then the random times for n > 0 are the 
outcome functions for a sums of independent random variables ladder 
process P* with state space the integers. 

Proof: If Pr^[to = Cq A • • • A K-i = ^n-i] > “^hen 

PrA = c J to = Co A • • • A = C,,_i] 

= ^ 5 A • • • A 5 A = 5 I :To 5 

A • • • A / 5 A ^ A 7^ 5 

A • • • A _ ^ = 5] 

= Prs[xi 7^ A • • • A Xc^ _c„ _ 1 = *5] by Theorem 4-9 
= Pr3[t, = c^ - c^_i], 

where is the time to return to state 5. Hence 

= j I to = a A • • • A t^_i = i] = Fg-^\ 
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Set Pfj = Since by recurrence of P 

j j 

P* is a sums of independent random variables Markov chain. 

The following is a converse to the preceding result. 

Proposition 6-33 : Let P' be a sums of independent random variables 
ladder process on the integers. There exists a recurrent chain P and a 
state s such that the times to return to s are the outcome functions 
for P'. 



Proof: Let = Po^ for k > 1; then 2^ = 1 = 1. We take Pto 

be a basic example and s to be state 0; the values of and in the 
basic example are yet to be specified. Define recursively the by the 
relations 

Pi = Qi = ^0 - ^1 

and 

Pn= Pl-- -Pn-lQn = = Pn-1 ~ h- 

In P we have Pr[ti — to = A:] = as required; it remains to be proved 
that P is recurrent. We have 

2 “ 2 ~ ^0 ~ Pn — ^ ~ Pm 

k=l k=l 

and since = i Pk = 1, we must have lim^ j8^ = 0. Hence P is 
recurrent. 

We close this section with two remarks about sums of independent 
random variables and their relation to recurrence. First, we have seen 
in Proposition 5-22 that a sums of independent random variables 
process on the integers with finitely many ^-values is a recurrent chain 
if and only if the A:-values have mean zero and their greatest common 
divisor is one. Second, we note that an infinite recurrent chain 
representing sums of independent random variables must be null, since 
a = 1 ^ is regular and 1 = oo. 

5. Convergence theorem for noncyclic chains 

By restricting our attention to noncyclic recurrent chains, we can 
prove a stronger result than the Mean Ergodic Theorem, namely that 
P^ itself converges with n to a limiting matrix. We shall give two 
proofs of this convergence theorem — the first a matrix proof using sums 
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of independent random variables and the second the classical proof 
using the Renewal Theorem of Section l-6f. We shall further show 
that, conversely, the truth of the convergence theorem just when P is 
the basic example implies the full validity of the Renewal Theorem. 

We begin by proving two lemmas needed in both proofs of the 
convergence theorem; their effect is to formulate noncyclicity in a 
number-theoretic way. 

Lemma 6-34 : For any Markov chain and any states i and j, 

P\T = s,, 

F\r = 0 , 

and 

P\f = 2 = "2 P^PiV'"^ for 

k=l fc=0 

Proof: The first two statements are obvious; for the third we first 
note that if Pr^[ly = A;] > 0, then for n > k, 

Prj[x„ = j I Ij = *] = Pr,[a;„ =. j \ = j A x^_-^ ^ j ^ ^ ^ j] 

= Pi([a;„ = j\x^ = j] by Lemma 4-6 
= Pr,[a;„_fc = j] by Lemma 4-6 

_ pin - fc) 

Hence no matter what the value of Pr^[ty = A:], it is true that, for 
n > k, 

Pr,[l, = k] Pr,[x„ = k\i, = k] = 

Using = j A Ij = k, I < A: < ^, as a set of alternatives for Xy^ = j, 

we have 

PW = 2 = ^]-Pr,[a:„ = j \ ly = A:] 

k = l 
k = l 

n - 1 _ 

2 t)y a change of variable. 

k = 0 

Lemma 6-35: A recurrent chain is noncyclic if and only if the set 
Z = [k \ > 0} has greatest common divisor one. 

Proof: If Z has greatest common divisor one, then the period for the 
state i is one. Conversely, suppose that the greatest common divisor 
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is c. We shall show that c divides d, the period of the chain. Hence, 
if c > 1, the chain is cyclic. Let n be the smallest integer for which 
> 0 and c\ n, and write n = qc + r with 0 < r < c. By 
Lemma 6-34, 

p<f = 2 

fe = i 
9 _ 

3 = 1 

Then the right side is zero since every term + jg zero, a contra- 

diction. Thus > 0 only if c | n. 

The next two lemmas lead to the convergence theorem; the first one 
is a consequence of Proposition 6-32 and the zero-one law for sums of 
independent random variables. 



Lemma 6-36: Let P be a noncyclic recurrent chain, and for a fixed 
state s let E and F be any two sets of integers whose union is the set of 
all non-negative integers. Then either 

PVjfx^ = s for infinitely many n e P] = 1 
or 

PVjfXn = s for infinitely many ne F] = \ (or both), 

and whichever alternative holds is independent of the starting 
distribution tt. 



Proof: Form the process P* of Proposition 6-32. We shall first 
show that for any two states i and j there is a state k which it is possible 
to reach from both i andj; for this purpose it is sufficient to show that 
from state 0 it is possible to reach all sufficiently large states, since P* 
represents sums of independent random variables. Now the set of 
states which can be reached from 0 is non-empty and is closed under 
addition (since P* represents sums of independent random variables); 
its greatest common divisor is one by Lemma 6-35. Hence by Lemma 
1-66 all sufficiently large states can be reached. 

By the zero-one law, which is Propositions 5-19 and 5-20, 

Pr^[t,^ G E infinitely often] = PT^[xJ^ = s for infinitely many n e E] 
is zero or one and is independent of L Thus 

Pr„[x,j = s for infinitely many ne E] 
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is zero or one and is independent of tt. If 

== ^ infinitely many w g J5] = 0 



and 

then 



^ for infinitely many ri g -F] = 0, 



= s infinitely often] = 0, 
in contradiction to the recurrence of P. 

The following lemma uses the notation l|j8|| = |A|- 

Lemma 6-37 : Let P be a noncyclic recurrent chain, and let j8 and y be 
probability vectors. Then lim,i_oo \\(P — y)P”|| = 0. 

Proof: Let = {n \ (/3P")s < (yP")JandP = [n \ (j8P’*)s > (yP")*}. 
By Lemma 6-36, either 



or 



Pr^[^n = ^ for infinitely many ne E] = \ 
VYy[Xj^ = s for infinitely many ne F] — 1, 



and by symmetry we may assume that the former alternative holds. 

Let be the statement that = s for some me E with m < n, 
and let 

= Pr^[x„ = k h 

Then 

||^(n)|| = ^(n )1 ^ Prg[~A<»)]->0 
by the above assumption. Also 

j8<o> = j8 

and 



= 



2/3f)P,, ifr^eP. 



v; 



We may represent this last identity conveniently by using €, a row 
vector such that and by defining 



= 



'^e \i n e E 



Then 



[O otherwise. 

^(n + l) ^ (^(n) _ §(n)jp^ 
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Next, we define 

y(n) = ^n) ^ (y _ ^)pn^ 

From this relation, 

y(n-l)p = ^(n-l)p + (y _ j8)P^ 

and 

y(0) = y 

and hence 

yin + l) ^ y(n)p ^(n + 1) _ ^(n)p ^ (^(n) _ 8(n))p^ 

We shall show by induction that First, y^°^ = y > 0. 

UO^E, then = 0 < y«». If 0 e £■, then y<<» = y^ > = j8<o> by 

the definition of E, and = 85°^ Thus y*°^ > 8^°' in either case. 
Suppose Then 

y(n) = (y(«-l) _ S(n-l))P > 0. 

If w ^ P, then 8‘'» = 0 and y<"> > 8'">. If w 6 then [(y - j8)P"]s > 0 
by the definition of E, and hence 

y(n) > ^(n) §(n) 

by the definition of y'"’. Thus y'”^ > 8'"^ for every n. 

In particular, we have y*"* > 0. Thus 

||y(")|| = y(»)1 

= + [(y - i3)P"]1 

= + (y - j8)(P"1) 

= + (y - j3)1 

= j3<">1 

= ||j8<»>|| 0. 

Finally 

||(|8 - y)P"l| = - y<»>|| 

by the definition of y^”\ and the right side is 

< Ili3^"'ll + ll/"'ll -> 0. 

Theorem 6-38: If P is a noncyclic recurrent chain, then lim„_a, 
exists. If P is ergodic, then lim = A = 1aandlim,i IkP^ — a|| = 0 
for every probability vector tt. If P is null, then lim,i (ttP^) = 0 for 
every probability vector tt. 
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Proof: Every recurrent chain is either ergodic or null; taking tt to 
be a vector with 1 in the ith entry and zeros elsewhere, we see that the 
existence of lim follows from the other assertions of the theorem. 

Let P be ergodic and let ^ = 1 a be the Cesaro limit of the powers of 
P. We have aP^ = a for every n. Letting jS = tt and y = a in 
Lemma 6-37, we obtain the desired result. 

Let P be null and suppose the assertion of the theorem is false. 
Then by Corollary 1-64, for some probability vector tt, there is an 
increasing sequence {n^ of positive integers and there is a row vector 
p ^ 0 such that 

lim (TrP^ic)i = Pi for every state i, 

k 

Certainly p^ > 0. Summing on i, we obtain 

^ Pi = ^ ^ lim 2 = 1. 

i i i 

the inequality following from Fatou’s Theorem. Applying Lemma 
6-37 with ^ — TT and y = ttP, we see that 

lim (7rP”fc‘^^)i = Pi for each i, 

k 

By Fatou’s Theorem, 

pP = (lim7rP”fc)P < lim 7 tP^>c^^ = p. 

Hence p is non-negative superregular and satisfies p1 < oo; p must be 
regular by Proposition 6-4, and the fact that P is null then contradicts 
Theorem 6-9. 

Corollary 6-39: If P is a null chain (not necessarily noncyclic), then 
lim P^ = 0. 



Proof: Let P have period d. By Theorem 6-38 



lim P^^ = 0. 

n 

By dominated convergence, lim,^ pna + r ^ q each r = 0, 1, 2, . . ., 
d — 1 . Hence, 

lim P^ = 0. 



The classical proof of Theorem 6-38 that follows proves only that 
lim P” exists. 
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Second proof of Theorem 6-38: We first prove the theorem for the 
ith diagonal entry. Set /„ = and 

M = 2 = 2 ’ = 2 ” = w] = = Mu. 

n n n 

Lemmas 6-34 and 6-35 establish all the hypotheses of Theorem 1-67 
except for the fact that 2n A = 1 • 

2/n = 2^«’ = 
n n 

Therefore or 0 according as /x is finite or infinite, and the 

value of the limit for the diagonal entries follows from Proposition 6-25. 
For the off-diagonal entries, let i and j be any two distinct states. 
Define a row vector ^ and a sequence of column vectors by 

F\f 

f{Pn~m).. if n > m 

[O if n < m. 

Then lim^ g^ff^ = exists since we have proved the theorem for diag- 
onal entries. Furthermore, by Lemma 6-34, 

(P")y = 2 

m = 1 

= 2 

m = 1 

Since j81 = 1 and g^^^ < 1, the Dominated Convergence Theorem 
applies and 

lim (P^)ij = lim 

n n 

= jS lim 

n 

= i3(L,y1) 



As a converse to the second proof of Theorem 6-38, we shall show that 
the convergence of P^ for every noncyclic recurrent chain implies the 
truth of the Renewal Theorem. This result is of particular interest 
because all that is needed is convergence of the diagonal entries of P”, 
when P is a noncyclic recurrent case of the basic example. 



= 

9T = 
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Proposition 6-40: If every noncyclic recurrent chain converges to a 
limiting matrix, then the Renewal Theorem holds. 



Proof: Let the sequence {f^} be given. Define r^ = 2fc>i fk\ fbe 
ri tend to 0 because A = 1* long as > 0, define 

2?( + i = ^ for i = 0, 1, 2, . . . . 

If = 0 for some i, then = 0 and the for k > i are irrelevant. 

Set qi = ^ — Pi and let the p^ and the represent the transition 

probabilities associated with the basic example. We have 



A = PiVi- -Pi 



rxr_2 
ro ri 



n-i 



n- 



That is, jSj = r^. Since r^—^0, /3j 0 and the chain is recurrent. 

Now 

= i8n-l(l - Pn) 

= Pn-1 - Pn 

— ^ fic ~ ^ fk 

k>n-l k>n 

= /n- 

Thus /X = 'Zn'^fn = ^ 00 - Lemma 6-34 we see that — P^Jo. 
The Markov chain is noncyclic by Lemma 6-35 because the greatest 
common divisor of the k's for which > 0 is 1. Therefore lim = 
lim P[fJ exists. On the other hand, by Proposition 6-25 the Cesaro 
limit of P{)^o^ is 1/Mqq = l/ju, if Mqq < oo or 0 if Mqq = -foo. Hence by 
Proposition 1-61 

ri//X if )LX < 00 
lim 

[O if /X = -1-00. 



6. Mean first passage time matrix 

The matrices M and M have already been defined by 

Mi, = M,[t,] 

M,, = M,[t,]. 

In Proposition 6-25 we saw that 

_ fl/a^ if P is ergodic and a1 = 1. 

^ 

I 00 if P is null. 



Proposition 6-41 : In any recurrent chain, M = E PM, 
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Proof: Apply Theorem 4-11 with the random time equal to one. 
Then 



Mi[ly] = 2 + 1] 

k 

= 2 + 1) 

k 

= 2 + '^Pih 

k k 

= {PM),j + 1 . 

For an ergodic chain P we define D to be the diagonal matrix whose 
diagonal entries are l/a^, where a1 = 1. From Proposition 6-25 we 
see that 

M = D M. 



Proposition 6-42: If P is an ergodic chain, the mean first passage 
time matrix M satisfies these properties: 

(1) M(ig = 0 and Jf > 0. 

(2) M is finite-valued. 

(3) (1 - P)M = E - D. 



Proof: The first statement is obvious, and the third follows im- 
mediately from Proposition 6-41 and the identity M = D M \i we 
can show that M is finite-valued. The problem thus reduces to proving 
(2). We know that = 0; therefore let i and j be distinct states. 
We shall show that M^- is finite. Let t = min (1^, t^). Then 



i = if,, = 

> Prlxt = i]My[ty \xt = i] 

> Pr^[x^ = i]Mij by Theorem 4-11 



If we can show that > 0, we will have 



l/T ^ jj 

S < CC. 



But 0 < ajaj = = ^Hji by Corollary 6-21 and Proposition 

4-15, so that ’Hji > 0. 



The remarkable fact about the mean first passage time matrix M for 
ergodic chains is that the converse of Proposition 6-42 — namely 
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Theorem 6-43 — is also true. Thus once a candidate for M has been 
found, even by guessing, we need check only that it satisfies (1), (2), 
and (3). 



Theorem 6-43: If P is an ergodic chain, the mean first passage time 
matrix M is characterized by these properties: 

(1) Mfig = 0 and M > 0. 

(2) M is finite valued. 

(3) (7 - P)M = E - D. 



Proof: Proposition 6-42 shows that M satisfies these properties. 
Let Y be any matrix for which (1), (2), and (3) hold. Let 0 be an 
arbitrary fixed state of the chain. It is sufficient to show that y, the 
zeroth column of Y, is equal to m, the zeroth column of M. Forming 
the chain °P, in which state 0 has been made absorbing, and writing 




0 0 
Q 



A IH A 

and y = I I and m = I 

\yl 



we see that m = {Mi[a]} and by Corollary 5-17 that m is the minimum 
finite-valued non-negative solution of the equation (7 — Q)x = 1. 
We first show that y is another finite- valued non-negative solution. 
We know that y is finite-valued and non-negative by hypothesis. The 
identity (I — P)Y = E — D yields in the zeroth column 



,:.)('•) - (' V') 

But yQ = 0 so that (7 — Q)y = 1 . We conclude that y > m. Since 
yo = mo = 0, we have y > m. Hence 

(7 - P)(y = P)y - {I - P)m = 0 

and y — m is a finite-valued non-negative P-regular function. Thus 
y — m — c1 by Proposition 6-3. Looking at the zeroth entries, we see 
that 0 = yo — mo = c. Therefore, y = m. 




7. Examples of the mean first passage time matrix 

In this section we shall compute the mean first passage time matrix 
associated with two infinite recurrent chains. The first example is a 
refiecting random walk, and the second is the basic example. 

Example 1 : Refiecting random walk. 
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A random walk on the non-negative integers is defined by the transi- 
tion probabilities 

= g, where q 0 and g 7 ^ 1 , 

+ i = p = I - q for i > 0 
Pii-i = q for i > 0. 

We note that the process P with 0 made absorbing is the infinite 
drunkard’s walk P'. For the present chain we have 

Hoo = pH 10 + gHoo = pH + q. 

But Pio is the absorption probability ^10 for the infinite drunkard’s 
walk. And Pjo = 1 and < I \i p > q. Therefore 

_ ^ p < q 

Pqo { „ 

[ < 1 if p > g, 

and P is recurrent if and only if p < ^. 

A similar relation holds for Mqq\ we have 



Moo = I + P^io + q^oo = 1 + ^^^ 10 - 

Since M^q is the mean time to absorption Af ^[a] in the P' chain, we see 
that Mqq is finite if and only if p < g. That is, 

Transient if p > q 

P is < null if p = g 

ergodic if p < ^. 



The chain is never cyclic, since Pqo > 0- 

We shall compute M for the ergodic case. Let r = g/p >1. A 
P-regular measure a must satisfy a = aP, or 



«o = 

= ^i-iP + «i + i^ for ^ > 0. 

From the first equation we obtain = aojr, and from the second, which 
is a second-order difference equation, we obtain 

= A Br~^ for i > 0. 

From the two equations we find ao = A B and «o/r = ^ 

Therefore, A/r = A, and since r > 1, we must have A = 0. Choosing 
B so that a^ = 1 , we arrive at the result 



= (1 - l/r)r-^ 
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The entries M^q of the mean first passage time matrix are easily 
found once the value of ao is known. Letting m be the zeroth column 
of the M matrix, we see from Proposition 6-42 that 



or that 
and 



(I — P)m — 1 — {Sjo/ao} 

mo — qrriQ — = 1 — Ija^ 

= 1 for i > 0. 



Since «o = 1 — pjq, we have 1 — 1/ao = —pl(q — p)- The fact that 
mo = 0 then reduces the first relation to 



= -pl(q - p), 

SO that 

= 1/(9' - P)- 

The difference equation for m^ has as a general solution 

nii = A + B(qlpY + il(q — p) for i > 0. 

The boundary conditions on ttIq and m^ imply that ^ = J5 = 0 and 
that 

•^to = i/(9 - p)- 



The computation of uses the same general methods. First, we 
note that if i < j, then the process must pass through i from 0 to get 
to j. Hence 

Mqi + M^j = Mqj 

or 

M^j = M^j — 

Now 



so that 



Moi = p + q{l + ifoi) 



For 0 < i < j. 



ifoi = l/^>■ 



Mij = ^)(1 + + + 9(1 + 

or 

^Oi ~ ^Oi ~ ^ Pi^Oj ~ -^O.i + l) + ^{^Oj ~ 

Thus for i > 0, 

P^0.i-\-l ~ ^Oi + 



and for i > 0, 



P^0,i + 2 ~ ^Q.i + 1 + 
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Solving this equation and using the relations J/qo = ^ ^oi = 
we find 

q . . .. i 



Moi = 



{q - pf 



(r‘ - 1) - 



P 



Algebraic manipulation yields the alternate formula 

P k = 0 

Since = M^j + M^q when i > j and since Mq^ + 
when i < j, we may summarize our results as follows: 






0 } 



Mu = ^ 



^ - J 
q - p 

q 



M - Pf 
Example 2; Basic Example. 






3 - ^ 
q - p 



a i > j 



if i < j. 



The vector j8 defined for the basic example has the property that 
= j8 if and only if lim^^oo = 0- Furthermore, P is recurrent if 
and only if limi_^oo = 0. When P is recurrent, it is null if 2t A is 
infinite and ergodic if 2i A is finite. In the ergodic case the regular 
measure a for which «1 = 1 has entries 

-A- 

i 

Entries M^j of the mean first passage time matrix for the basic 
example satisfy the equations 

Jfoi + = ^0; for ^ < 3 

and 

Mij = Mio + Moj for i > j. 

Since 



+ ^Oi ~ ^ii ~ 



Ih 

Pi 



for i > 0, 



it is sufficient to compute Mq^ for the chain. Taking the statements 
{the process moves from 0 to k < i — 1 and then to zero, the process 
moves directly to i] as a set of alternatives, we find that 



+ 2 + + 1 + Mq^). 

k = 0 




162 



Recurrent chains 



Solving this equation with the aid of the relation + i — Pk + i^ 
we obtain 

J 2 

Pi ic^i 

Therefore, for i > 0, 

^.0 = ^2 

Pi 

The general entries may be computed from 

= ifoy - Moi if i < j 

Mij = if i > j, 

8. Reverse Markov chains 

Let {x^ be the outcome functions for a denumerable Markov process 
defined on a space Q and with range in S. The outcome functions 
appear in a certain order and represent the forward passage of time. 
One may well wonder, however, if, when the functions are looked at in 
reverse order, the process is in any sense still Markovian. It is the 
purpose of this section to discuss this question; as a by-product of the 
discussion, we shall gain an interpretation for the dual of an ergodic 
Markov chain. 

The sense in which a Markov process reversed in time is still a Markov 
process is the following. 

Proposition 6-44: Let {x^} be a denumerable Markov process and let 
iV be a fixed positive integer. Define = x^_y^ for 0 < n < N and 
= “stop” for n > N, Then the functions y^ are the outcome 
functions for a denumerable Markov process with the same state space 
with “stop” adjoined. 

Proof: We shall show that the functions y^ satisfy the Markov 
property. Clearly, this needs to be checked only for n < N . If 
Pi’[yo = Co A • • • A 2/n-i == > 0, then 

= C,, 1 2/0 = Co A • • • A -1 = Cn-l] 

= Pr[x^_,i = Cj^\ X^ = Co A • • • A ^N-n + 1 — ^n-l] 

— ~ ^ ^N-n + 1 ~ ^n-1 A---A Xj^ = Cp] 

-n + 1 — ^n-1 A • • • A = Cp] 

The numerator is 

A a:^-n + l = C^-l] 

X Pr[X;^_,i4.2 = (^n-2 \ -n ~ A ^N-n + l = ^n-l] 

X • • • X Pr[o:^ = Co 1 x^_,, = A • ♦ • A x^_^ = cj, 
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which by the Markov property is 

Pr[a;jy_n = ^ ^N-n + 1 ~ - l] * _ i c„ _ 2 ^OiCq- 

The denominator similarly reduces to 

P^[^iV-n + l “ - 1] * _ 1 c„ _ 2 

Dividing, we obtain 

Pr[^n = Cn I yo = ^0 A • • • A yn-i = C^-i] 

— ~ A ^AT-n + l = ^n-ll 

P^[^V-n + l = ^n-l] 

= Pr[^v-n = c,, I o:^_n + i = C^-i] 

= Pr[y^ = ! Vn-l = Cn-l]. 

It is not true in general that, if the original process is a Markov chain 
P, then the new process is also a Markov chain. Let P be started with 
distribution tt. We then have, if Pr[y^_i = i] > 0, 

Pr[yn = 3 I Vn-i = i] = = j \ + i = i] 

— -n ~ 3 ^ ^N-n + 1 ~ 

Pr,[^W-n + l = i] 

^ (7rP^-"),-P,, 

(^pN-n + l)^ 

The last quantity on the right need not be independent of n. Neverthe- 
less, if P is ergodic, there is a case where we can state a positive result — 
a result which gives us an interpretation for the dual of P. 

Proposition 6-45: Let {x^} be the outcome functions for an ergodic 
chain P, let iV be a fixed positive integer, and let a be the unique P- 
regular probability measure. If P is started with distribution a, then 
the process {y^ = 0 < n < N} is an initial segment of the 

Markov chain with transition matrix P and with starting distribution a. 

Proof: If Pr[y^_i = ^] > 0, then cr = aP^ = a and 
^^al-Vn — 3 \ Vn-l ~ ~ -n + ly 

independently of n for n < N. 

The motivation for calling P the reverse chain when P is recurrent 
now becomes clear. 
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9. Problems 

1. Compute for the Land of Oz example P^, and P®. What is A — 
lim ? Show that each row of ^ is a regular measure and that each 
column is a regular function. 



0 0 0 0 



The process is started in state 1. 
various states in the long run. 

3. In the basic example, let 



Find the probability of being in the 



(rfr)’ 



Is the chain transient, null, or ergodic ? 

4. Prove that « = is regular for any sums of independent random 
variables process. Give a careful statement as to the existence of 
transient, null, and ergodic examples. 

5. Establish the following relationships between a chain with transition 
matrix P and one with matrix P^: 

(a) If P is transient, then P^ is transient. 

(b) If P is recurrent, then P^ is recurrent and = ca^. 

(c) If P is ergodic, then P^ is ergodic, but the converse is not always true. 

6. Prove that if a recurrent P has column-sums equal to 1, then P = P^. 

7. Consider sums of independent random variables on the integers w ith 

= i and Pi = §. Choose two essentially different positive regular 
measures a, and show that each gives a correct expression for Wjy in 
Corollary 6-20. 

8. Show that if Pjj > 0 for a single state in a recurrent chain, then the chain 
is noncyclic. 

9. Show by an example that Mij — a^Mjilai need not be true. 

10. Show that in an ergodic chain aM may be either finite-valued or infinite- 
valued. 

11. Determine whether the following chain is transient or recurrent: 



0 10 0 



1111 
4 4 4 4 



0 0 0 1 
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If transient: Put into standard form, and find N, B, and a. 
If recurrent: Is it cyclic? Find a, M, P, M. 

12. Do the same for 




13. Find a and M for an independent trials process by the methods of this 
chapter, and check your answers by a direct computation. [See Problem 
5 in Chapter 4.] 

14. (a) Complete the work of finding M for the basic example. 

(b) Find the reverse of the basic example (when recurrent), and compute 
M for this chain. 

15. A light bulb in a fixture lasts j time units with probability /y. It is 
replaced with a similar bulb when it burns out. Assume that 2/; = 

/o = 0, and /i > 0. Let be the length of time that the bulb in use at 
time n has lasted (taken to be 0 if there is a replacement at time n). 
Show that is the set of outcome functions for a Markov chain and 
discuss the connection with the basic example. Show that the prob- 
ability that a bulb is replaced at time n tends to a limit as n -> oo. 

Problems 16 to 26 refer to sums of independent random variables on the 
circle. Mark n (ti > 3) points on a circle, labeled i = 0, 1, . . . , n — 1 clock- 
wise. The process either moves one step clockwise with probability f , or it 
moves one step counterclockwise with probability J. 

16. Prove that the chain is ergodic. Is it cyclic ? 

17. Find a positive regular measure a with a1 = 1. Interpretit. 

18. Construct the reverse chain. 

19. Compute M by means of Theorem 6-43. [It is sufficient to find 

20. Show that for large n, 

- 3(71 - i - 77(1)0. 

Compare this result with the absorption times of a drunkard’s walk on 
{0, 1, . . . , 7i} with p = I . 

21. Show that the approximation in the previous problem is excellent for 
n = 50. 

22. Use the approximation of Problem 20 to show that the maximum value 
of MiQ occurs approximately at 

. log n 
" log 2 

Check this conclusion for 77 = 50. 

23. Let 77 be even, and let E be the set of even-numbered states. Compute 
P^. 

24. For 77 = 3, compute P^, P^, P®, and P®. What is the limit of P” ? 

25. Repeat Problem 24 for 77 = 4, 

26. Show that aM = c„1^, and find an asymptotic expression for c^. 
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1. Brownian motion 

One of the fruitful achievements of probability theory in recent years 
has been the recognition that two seemingly unrelated theories in 
physics — one for Brownian motion and one for potentials — are mathe- 
matically equivalent. That is, the results of the two theories are in 
one-to-one correspondence and any proof of a result in one theory can 
be translated directly into a proof of the corresponding result in the 
other theory. In this chapter we shall sketch how this equivalence 
comes about, and we shall see that Brownian motion is a process which 
is like a Markov chain except that it does not have a denumerable state 
space and time does not proceed in discrete steps. The details of this 
equivalence can be found in Knapp [1965]. The important thing to 
notice will be that the definitions of potential-theoretic concepts in 
terms of Brownian motion do not depend on isolated specific properties 
of the process but depend only on the Markovian character of Brownian 
motion. In other words, there is reasonable hope of defining for an 
arbitrary Markov chain a potential theory in which analogs of the 
classical theorems hold. 

We begin by describing Brownian motion. In 1826 the botanist 
Robert Brown observed that microscopic particles, when left alone in a 
liquid, are seen to move constantly in the fiuid along erratic paths. 
Much later Albert Einstein investigated this movement of particles 
from a theoretical point of view. Einstein was able to derive statistical 
laws which estimate hou ii larg(‘ hiimlxu* of particles spread over a 
period of time, and his predictions were verified. 

In setting up a probabilistic model for this so-called Brownian motion, 
we simply replace Einstein’s estimate of what happens to a large 
number of particles by a probability for what happens to one particle. 
We are then to require that 

Pr[particle started at is in E' at time t] = I 
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where E du Borel set in three-dimensional (Euclidean) space and 
\u — y\ denotes the Euclidean distance from u to y. If we use the 
notation e E] for the left side and the notation p^(u, E) for the 

right side, we have 

eE] = p^{u, E). 

By Theorem 1-41, F) is a measure depending on t and u and 
defined on the smallest Borel field containing the open sets of R^. 
Therefore, we may write 

Pr„[a;, e £ p^(u, dy). 

The physical theory also makes us require that if < ^2 < ‘ ‘ 
then — x^^, . . . , x^^ ~ should behave like a set of 

independent random variables with x^^^ — x^ having the same distri- 
bution as x^. That is, we require that 

Pr„[a;(^ 6 A • • • A (X(^ - E^] 

= Pr„[xj^ e^i] Pr„[a:(„ - eE^] 

= eEy] Pr„[a;((^ - 1 ) e E^]. 

This identity implies that we must have 

Pr^[Xg eE ^ XyeF^‘ — ^ x^eG a x^e H] 

= I dw) I p^~"^{w,dx) |•••| P^~^{y^dz). 

J E J F J J H 

We note that with these various requirements we have given more than 
one definition for Pr„[a:^ e E] and that we must check, for example, 
that 

Pr^[x 3 e R^ A Xf. e E] = Pr g E] 

and 

Pr^[x, E F A x^e R^] = Vr^[x, e F]. 

Such identities can be verified by direct calculation. It should not be 
too surprising that such consistency conditions arise since they arose 
with denumerable stochastic processes earlier: In the proof of the 
Kolmogorov Extension Theorem in Chapter 2 we required that the 
measures on cylinder sets all be consistent. 

Now for any denumerable Markov chain P we have 

(1) 0 < P,, = Ft,[x, =jl 

( 2 ) Prlx,eS] = 2P,, < 1, 

j 

(3a) Prj[xi = j A X 2 = A • • • A _ 1 = r A a;„ = 5] 

= FijPjic • • • Prs- 
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The last equality (3a) implies and is implied by 
(3b) Prf[xi eE A X 2 EFA--A eG A x^eH] 

= 2 Ai 1 A>c - 2 As- 

jeE keF seH 

It is easy to prove both the Markov property and the Markov chain 
property from (3a), and hence (1), (2), and (3b) give an equivalent 
definition of Markov chain. The analogous statements for Brownian 
motion are 



(1') 

( 2 ') 

and 



0 < p\u, dy) = Prja^t e E], 

Pr„[xj G = r ^t(M, dy) = 
Jb^ 



(3') 'PTy[Xq e E A x^e F a • • • A x^e H] 

= I dw) I p^~^(w,dx) |•••| p^~^(y,dz). 

Je Jf J Jh 

As expected, these statements imply that the position of a particle at 
time t s depends only on s and the particle’s position at time t, and 
not on the value of t or what happened to the particle before time t. 
(This assertion can be formulated precisely in terms of means of 
functions given a Borel field, which are a technical generalization of 
conditional means of functions given a partition.) 

As with denumerable Markov chains we need not require that a 
Brownian motion particle be started deterministically at a state u. 
If we start the particle according to probabilities assigned by a measure 
/X on jR^, then we have 

Pr„[x,G^] = 

= P^{u, dy)dn{u). 

JR^ Je 



A similar expression holds for the probability of being in a finite 
sequence of sets at specified times. 

In Section 3 we shall need a formal definition of Brownian motion, 
and we use the formula for Pr^[a;^ e E] to motivate it. We define a 
transformation of the measures /x on with fji(R^) = 1 into 
themselves by 



ifxP^){E) = Pr„[xj eE] = 




1 






dy. 
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Later we shall replace the expression in brackets by /(y, t) for simplicity 
of notation. That is a measure follows from Theorem 1-41, and 
that (/xP^)(P^) = 1 follows from the identity 

after a change of variable. 

Definition 7-1: The totality of theorems about the operators P\ the 
measures /x on with /x(P^) = 1, and quantities definable in terms of 
them and properties of is called Brownian motion theory. 

We immediately extend P^ by linearity to be defined on all finite 
measures on R^ and all differences of two finite measures. 

2. Potential theory 

Classical potential theory begins as a study of Coulomb’s law of 
attraction of electrical charges in physics. This law states that every 
two charges in the universe attract (or repel) each other with a force 
whose direction is the line connecting them and whose magnitude is 
proportional to the magnitude of each of them and inversely propor- 
tional to the square of the distance between them. That is, 




where €q is a constant depending on the units. As an aid in the study, 
one introduces the notion of potential: The potential at a point x due 
to a charge q is the work (or energy) required to bring a unit charge 
from infinity to the point x. It can be shown that this potential is 
independent of the path along which the charge is brought to the point 
X and that its value is 

1 q 

2tt \x — Xq\ 

where Xq is the position of the charge and where the constant 1/27 t has 
been fixed after a certain choice of units. 

More generally one defines a charge distribution to be the difference 
of any two finite measures defined on the Borel sets of R^, that is, the 
smallest Borel field in R^ containing all open sets. The potential at x 
due to the charge distribution is again the work required to bring a 
unit charge from infinity to the point x. Since force (and hence work) 
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are additive, the potential due to a charge distribution consisting of 
charges . . . , at points . . . , is 

— V 

N - Xjf 

Passing to the limit in an appropriate sense, we would expect the 
potential due to an arbitrary charge distribution fx to be 

J_ r 

27 ’- Jb" \x - y\ 

After checking that such an expression is always well defined, we shall 
define a potential to be any function of this form. 

Lemma 7-2: If /x is a charge distribution, then 

is finite a.e. with respect to Lebesgue measure. 

Proof: It suffices to prove the lenima for the case where ^ is a 
measure, since the general case follows l)v taking differences. Let 
denote the closed ball about the origin of radius /?, and form 

fJ, I. 

By Fubini’s Theorem we may interchange the order of integration to 
get 

The inside integral on the right is bounded by its value at the origin, 
which is some finite number c. Thus the right side does not exceed 

^c/x(i?3) < 00, 

and g must be finite a.e. on Since the countable union of the sets 
is we conclude that g is finite a.e. 

Definition 7-3: The function 

for /X a charge distribution will be called the potential of /x. 




7-3 



Potential theory 



171 



The class of theorems relating charges and potentials and quantities 
definable in terms of them and properties of is called classical 
potential theory. The operator transforming a charge into its potential 
is called the potential operator. The kernel of the potential operator 
is called the Green’s function. 

As we have defined it, potential theory contains the subject known in 
physics as electrostatics. Our definition includes the notions of 
distance, charge, and potential, and all the quantities commonly arising 
in electrostatics are definable in terms of these three notions. As an 
illustration. Table 7-1 shows how some of the quantities arising in 
electrostatics are related dimensionally to distance, charge, and 
potential. The table uses the notation 

distance = x distance = x 

time = t and charge = q 

mass = m potential = V 

charge = q 

Table 7-1. Dimensions of Electrostatic Concepts 



Concept 


Dimensions 


Potential- Theoretic Dimensions 


Capacity 


qH^lmx^ 


qjV 


Charge 




q 


Energy 




Vq 


Field 


mxlt^q 


Vjx 


Force 


mx/t^ 


Vqjx 


Potential 


mx^/t^q 


V 



We give four examples to illustrate how concepts may be defined 
explicitly in terms of distance, charge, and potential. 

(1) We can reasonably ask what the total amount of work required 
to assemble a charge distribution is if only an “infinitesimal” amount 
of charge is brought into position at one time. The way to compute 
this amount of work is to integrate the potential function against the 
charge distribution, provided the integral exists. Thus we define the 
energy of a charge distribution to be the integral of its potential with 
respect to the charge, provided the integral exists. 

(2) The total charge of a charge distribution /x is 

(3) If a total amount of charge q is put on a piece of conducting metal 
in the charge will redistribute itself in such a way that the potential 
is a constant on the set where the metal is. The situation where the 
potential is constant on the metal is the one which minimizes energy 
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among all charges /x with total charge q and with /x vanishing away from 
the set where the metal is, and this situation is referred to as equilibrium. 
We define an equilibrium potential for a closed set to be a potential 
which is 1 on ^ and which comes from a charge distribution which has 
all its charge on E. An equilibrium set is a set which has an equilibrium 
potential. 

(4) The capacity of a conductor in is defined as the total amount 
of charge needed to produce a unit potential on the set where the 
conductor is. We thus define the capacity of any equilibrium set to be 
the total charge of a charge distribution producing an equilibrium 
potential. 

To indicate the directions in which classical potential theory leads, 
we shall state without proof some of the theorems in the subject. The 
support of a charge is defined to be the complement of the union of all 
open sets U with the property that the charge vanishes on U and every 
measurable subset of U, 

(1) Uniqueness of charge: A potential uniquely determines its charge. 

(2) Determination of potential: A potential is completely determined 
by its values on the support of its charge. 

(3) Uniqueness of equilibrium potential: A set E has at most one 
equilibrium potential. (This assertion is a corollary of (2).) 

(4) Characterization of equilibrium potential : The equilibrium potential 
for an equilibrium set E is equivalent to the pointwise infimum of all 
potentials which have non-negative charges and which dominate the 
constant function I on E. 

(5) Principle of domination: Let h and g be potentials arising from 
non-negative charges jl and /x, respectively. If h dominates g on the 
support of /X, then h dominates g everywhere and fl{R^) > ijl(R^)- 

(6) Principle of balayage: If gr is a potential with a non-negative 
charge and if is a closed set in R^, then there is a unique potential g 
with a non-negative charge with support in E such that g g on E. 
The potential g (called the balayage potential) satisfies g < g every- 
where, and its total charge does not exceed the total charge of g. 
The balayage potential is equivalent to the pointwise infimum of all 
potentials which have non-negative charges and which dominate g on 
E. It is equivalent to the supremum of all potentials which are 
dominated hjgonE and whose charges have support in E. 

(7) Principle of lower envelope: The pointwise infimum of potentials 
with non-negative charges is equivalent to a potential with a non- 
negative charge. 

(8) Non-negative potentials: The charge distribution of a non-negative 
potential has non-negative total charge. 
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(9) Energy of equilibrium potential : If E is an equilibrium set of finite 
energy, then the equilibrium potential minimizes energy among all 
potentials whose charges have support in E and whose total charge is 
equal to the capacity of E. 



3. Equivalence of Brownian motion and potential theory 

Kakutani [1944] observed that several of the basic quantities of 
potential theory, like equilibrium potential, had simple probabilistic 
interpretations in terms of Brownian motion. If E is an equilibrium 
set, the value of the equilibrium potential at x is the probability that a 
Brownian motion process started at x ever hits the set E. Doob and 
Hunt extended Kakutani ’s work, and it gradually became clear that in 
a certain sense Brownian motion and potential theory were really the 
same subject. 

To say that they are exactly the same would be to say that every 
theorem about Brownian motion should interest a person studying 
potential theory, and conversely. Although it is doubtful that this 
situation is the case at present, it is certainly true that modern develop- 
ments in the two subjects are moving more and more in this direction. 

We shall now show that there is a natural way in terms of Brownian 
motion of obtaining the operator mapping charges into potentials, and 
that, conversely, from the potential operator it is possible to recover the 
family {P^} of transition operators for Brownian motion. These facts 
make it clear that in a technical sense the two theories are identical. 

The proof in the first direction is easy and is completed by Proposition 
7-4. We recall from the definition of /xP^ that 

{nP^)(E) = f(x, t)dx 
for a certain function /(a:, t). 



Proposition 7-4: Every theorem about potentials can be formulated 
as a theorem about Brownian motion. Specifically, if /x is a charge, 
then the potential g oi y. satisfies 

g{x) = lim f f{x, t)dt, 

r-»oo Jo 

where 

(^PO(P) = f f{x, t)dx. 

Je 



Proof: We may assume that /x > 0 without loss of generality. 






dt. 



Then 
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By Fubini’s Theorem we may interchange the order of integration. 
The above expression is 

We make the change of variable on t which sends \x — y\^lt into 
The expression becomes 

00 

\x-y\I^T 

By the Monotone Convergence Theorem, 

/o ^ [|o (2^ “ y\-'^e-'^"iHu dfi(y) 




= g(x). 



Proposition 7-4 is a precise statement of the connection between 
Brownian motion and potential theory in one direction. We see that 
formally the potential operator is 

lim r PHt. 

r-oo Jo 

Thus to complete the proof of the equivalence of the two theories, what 
we need to do essentially is recover a sequence from its limit. Of 
course, we cannot do so unless we know some other properties of the 
sequence, and it is the isolation of these properties that makes this half 
of the equivalence difficult. We shall not go into the details here, but 
we can indicate the general approach to the problem. 

Let Cq denote the set of continuous real-valued functions / on 
which vanish at infinity ; that is, which are such that for any e > 0 there 
is a ball of finite radius in outside of which /is everywhere less than e 

in absolute value. We define on Cq by 

(Q^f)iy) = ( 2 " 7 )5 72 

The following facts can be checked: 

(1) If / is in Cq, then so is Q/. 

(2) supj, \(Q‘f)(y)\ < supj, \f{y)\. 
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(3) 

(4) (Q^f){y) converges uniformly to f(y) as t decreases to 0. 

(5) (Q^f)(y) converges uniformly to 0 as ^ increases to oo. 

Now any set function in the class M of differences of finite measures is 
completely determined by its effect on all the functions in and a 
direct calculation shows that 

for every fie M and / e Cq. It follows that and this equation 
completely determine PK Therefore, it is enough to recover from 
the potential operator in order to prove our result. 

For every f e Cq such that ('tlt)[{Q^f){y) — f{y)] converges uniformly 
as t decreases to 0, we define 

no f 

It turns out that if A is known on its entire domain of definition, then 
is completely determined by the definition of A and by the first four 
properties of listed above. Thus if A could be defined within 
potential theory, then so could each of the operators Q^: They are the 
unique family of operators such that the definition of A and properties 
(1), (2), (3), and (4) hold. The actual proof of this existence and 
uniqueness consists in writing down a concrete formula for in terms 
of A and t; we reproduce it in order to show that nothing appears in the 
formula except A, t, and the identity operator I: 

Qi = lim 2 i<''[A2(A7 - - Xlff. 

A- 00 ff^Q IC\ 

For every / in such that {Q^f)(y)dt converges uniformly as 
T —*■ CO, we define 

(?/=lim riO^/idt. 

T-*oo Jo 

It can be shown readily from the five properties of that O and — A 
are inverse operators on their respective domains. Thus each uniquely 
determines the other. Finally (and here is where some work is required) 
G looks sufficiently like the potential operator when its defmition is 
compared with the formulas of Proposition 7-4 that the potential 
operator determines O. Thus the potential operator determines O, 
G determines A, A determines and ^ determines P*. Hence every 
theorem of Brownian motion theory is a theorem of potential theory. 
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4. Brownian motion and potential theory in n dimensions 

In the mathematical formulation of Brownian motion and potential 
theory there is no need to restrict the underlying space to be three- 
dimensional. We can define an n-dimensional Brownian motion 
operator by 



The potential operator differs in appearance from dimension to dimen- 
sion more than the Brownian motion operator does, but its kernel is 
still a constant multiple of the integral of The potential 

g(x) of /X, the difference of two finite measures, is defined by 



g{^) = < 



where 



I — J* I a: — y\djji{y) in dimension 1 

2 log \x — y\dfx{y) in dimension 2 
JR^ 

\^n f j 2 in dimension 71 > 3, 

I Jb” ~ y\ 

c, = hn--‘^r{^(n - 2 )). 



In dimension n > 3, g(x) is necessarily finite a.e., but in dimensions 1 
and 2 we shall need to assume that g is finite a.e. 

The fact that the kernel lj\x\^~^ tends to zero at infinity in dimension 
n > 3 but the kernels \x — y\ and log |:r — i/| do not tend to zero in 
dimensions 1 and 2 gives us a clue that the potential theory or dimen- 
sions 1 and 2 will differ sharply from that in higher dimensions. We 
shall discuss the reason for this difference shortly. 

In dimension n > 3, Brownian motion theory and potential theory 
are again equivalent. The formula 

g(x) = lim f J 7 2 f i^^dix(y)dt 

r-+oo Jo \^nt) 



generalized from Proposition 7-4 is still valid, and the discussion of 
Section 3 goes over with little change to establish the equivalence. 

But in dimensions 1 and 2, it does not. The above formula is not 
true for these dimensions, and the argument in Section 3 fails after the 
operator G is introduced. The reason for this failure is the following. 
We recall that in dimension ?i > 3 the potential operator is formally 
limj._^^ P^dt. In dimensions greater than or equal to 3, this 
quantity is finite, whereas in dimensions 1 and 2 it is infinite. Now 
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Jq P^dt plays much the same role for Brownian motion that 
plays for denumerable Markov chains. It is finite if the 
process is transient, and infinite if the process is recurrent. In fact, the 
distinction between transience and recurrence is what is relevant for 
Brownian motion here : In dimensions 3 and greater a Brownian motion 
particle after leaving the unit ball of returns to it with probability 
less than one, whereas in dimensions 1 and 2 it returns with probability 
equal to one. 

The potential operator in dimensions 1 and 2 arises in a different way. 
Specifically, the formula generalized from Proposition 7-4 is not valid in 
general, but it is valid if /x has total charge zero and if a mild additional 
condition is satisfied. The exact formulation of this result will interest 
us later, and we give it as the next proposition. 



Proposition 7-5: In for n = 1 or 2, let /x = be the 

difference of two finite measures, and suppose that 

\x — y\dfjL^(y) < 00 and \x — y\djjL~(y) < oo a.e. if/i = 1 
Jr^ Jr^ 



or 



llog \x - y\ \dn^{y) < oo and [log |x - y\ \dy.-(y) < oo 
JR^ Jr^ 

a.e. if n = ; 

If # 0, then 

f /o \ni 2 f e~'^~^'’‘'^^dy.(y)dt = +oo or -oo a.e. 

If = 0, then 

g{x) = lim f f e-'^-^'"‘i^^dy.{y)dt 

r->oo Jo Jb” 

exists, is finite a.e., and satisfies 

- f k - y\diJ^(y) if w = 1 

JR^ 

g{x) = < 

2 log \x - y\dix(y) ii n = 2. 

V JR^ 



Proof: We prove the result for n = 1; the ideas in the proof for 
n = 2 are similar. The same calculation as at the beginning of the 
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proof of Proposition 7-4 shows that for n = \ 



/•r 1 p 

Jo \/ 2TTt J - c 



^'^^dfx{y)dt 



"00 

y\ I u~^e~‘^^^^du dfi{y). 

J _ V 27 t 

Jx-yU^T 



If we use integration by parts on the terms in the brackets, differentiat- 
ing the exponential and integrating u~^, we find that the right side is 



\x — y\ 



Ix-J/I J 

y/f \x-y\!^T 



00 

j dfx{y) 



F - y\ 



e ^^'^du\dfjL{y) 



\x-y\lVT 



+ -= VTe-^^-y^"i^^dy.{y). 

V 2 tt j - CO 

We let T tend to oo and consider each term separately. In the first 
term the expression in brackets increases to 1 . If we write the integral 
as the difference of one with respect to and one with respect to /jl" 
and use the fact that 



\x — y\diJL'^(y) < 00 and 



F - y\dfjL (y) < 00 a.e., 



we see by the Dominated Convergence Theorem that the first term tends 
a.e. to 

^00 

\x - y\dy(y). 

J — 00 

Next we consider the second term. Suppose first that ^ 0. 

The second term tends, as T oo, to 



lim V T lim - 

T-*co T-^co 'y 2 tt 



‘^’^dy{y). 



and the integral and the second limit may be interchanged by dominated 
convergence to become 






+ 00 . 
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To complete the proof we shall show that if = 0, then 

J — 00 

tends to zero a.e. as T — > oo. Since jj-(R^) = 0, we have 



r = r°° - l)d,x{y). 

J - CO J - CO 

We shall prove that the right side even tends to zero when (jl is replaced 
hy fji'^ or jji~ Let us do so for ijl'^ . First we show that 

- 1)1 < A:l^| 

for a fixed constant k and for all T. By differential calculus methods, 
we find that 

g-|2|2/2r _ I 

N 

assumes its maximum value as a function of when | 2 ;| satisfies 

_ p\z\^l2T _ 1 

The unique positive solution occurs for 2 < | 2 :|^/T < 3. Let b = 
|2|2/ with 2 < b < 3 be the point at which the solution occurs. Then 



p-\z\^l2T _ 1 

VT- rn ^ 



e-b/2 _ I 

V T 



VbT 



1 - e-0'2 

Vb 



= k 



and 



|^y(g-|2|2/2r _ 1)1 < 

Put |2:1 = \x — y\. Since 

^00 

k\x — y\dfx'^{y) < 00 a.e., 

J — CO 

we have by dominated convergence 

lim f \/T(e" — l)djji'^{y) 

00 J _ 00 

= r lim — l)]c?/x‘^(y) 

J - 00 T-*co 

for almost every x. The integrand on the right side is identically zero. 



The hypotheses of Propositions 7-4 and 7-5 are worth reviewing and 
comparing. In the transient case. Proposition 7-4, we started with 
any element fjiof M and we were able to conclude both that the potential 
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operator was defined on fx and that its value was the Brownian motion 
limit. In the recurrent case, Proposition 7-5, we started with any 
element fx of M and we had to assume that the potential operator was 
defined on /x; we then concluded that the potential of fx was equal to 
the Brownian motion limit if and only if /x had total charge zero. We 
shall see that the same thing happens in potential theory defined for 
denumerable Markov chains. 

5. Potential theory for denumerable Markov chains 

We turn our attention now to those properties of Brownian motion 
which relate it to potential theory. In this section we shall answer 
the following questions: 

(1) How can the connection between Brownian motion and classical 
potential theory be used to define a potential theory for denumerable 
Markov chains? 

(2) How does potential theory differ in the transient and recurrent 
cases, and what form does the potential operator take? 

(3) What is the nature of the inverse operator that transforms 
potentials into charges? 

(4) What other Markov chain concepts play a role in potential 
theory? 

For definiteness, let P be a denumerable Markov chain which either is 
recurrent or is transient with no absorbing states, and let a be a positive 
finite-valued P-superregular measure. 

Before defining a potential theory for denumerable Markov chains, 
we should discuss some properties of the operators P^ and QK The 
operators P^ and act respectively on differences of finite measures 
and on functions in Cq according to the equations 

and 

and they are related by the identity 

f = f 

Ji?" Ji?'" 

The linearity properties 

(/X 4- v)P^ = fxP^ + vP^ 

and 

(C/i)P‘ = c(/iP‘), 
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together with a certain property of continuity, imply that the action of 
on differences of finite measures is analogous to the action of a matrix 
on row vectors. Similarly, the corresponding properties for imply 
that the action of on functions in Cq is like that of a matrix on column 
vectors. But the real insight into P^ and comes in realizing that 
these matrices are identical. To be more specific, we must reformulate 
the assertion for a countable space. 

Let P* be a continuous linear operator on row vectors which have a 
finite sum, and let Q* be a continuous linear operator on column vectors 
/whose components tend to 0. Suppose that P* and Q* are related by 
the identity 

for all fjL and /of the type specified, where • stands for vector multiplica- 
tion. Let and be the row and column vectors having ith 
component equal to 1 and all other components equal to 0. The vector 
is in the domain of P*, and is in the domain of Q*. If we define 
a square matrix {P^*} by 

P* = = (S<'>P*)y, 

then 

k k 

Hence the operator P* may be represented by the matrix {P*}. But 
by the identity relating P* and Q*, 

p* = 

Hence by a similar argument, the operator Q* is also representable by 
the same matrix {P*}. 

Thus the denumerable Markov chain analog of the pair of operators 
P^ and can be expected to be a single matrix depending on t. For 
^ = 1, this matrix can be taken to be the transition matrix P of the 
Markov chain. Then the relations P^ + ® = P^P« and for 

integers t and s imply that the analog of P^ and for any other 
integer value of ^ is a power of the matrix P. 

Lebesgue measure has a special property with respect to Brownian 
motion which is summed up in the equation 



( dx= f [ f 
Je Jb" Ue 



(27Tt)^l^ 









dx. 



If we call Lebesgue measure a and use notation that earlier was re- 
served for finite measures, this equation becomes 



(7 = oPK 
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That is, a is regular for PK Thus the analog of a for the Markov chain 
P should be a P-regular measure. But if P is transient, then P need 
not have a regular measure. We therefore relax our requirement and 
ask only that the analog of a be a P-superregular measure. We can 
then decree that the specified measure a is to be the analog of a. 

The problem of defining potentials for Markov chains becomes a 
problem of translating notions about Brownian motion into notions 
about Markov chains. Following Propositions 7-4 and 7-5, we recall 
that a potential g(x) is obtained from a charge yu in this way: If we 

abbreviate the equation 



as 



then 






(liP‘){E) = fix, t)dx, 

g{x) = lim f f(x, t)dt. 
r-^c» Jo 



dfji(x) 
dy 



Translating the relation for {fjiPf(E) into notions about Markov chains, 
we write 

ieE ieE 

If E is the one-point set {i], we find that 

/<”> = - 
or 

f^^'> = dual (/xP^). 



The equation defining g(x) translates into 

g = lim [/<o> + /<!> + • • • + /<">] 

n-^ 00 

or 

g = dual lim [^(/ + P -h • • • + P^)] 

00 

Classically, potentials are left as point functions and are never trans- 
formed into set functions because such a transformation is frequently 
impossible. In Markov chain potential theory, however, every column 
vector can be transformed into a row vector by the duality mapping. 
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If we take the dual of both sides of the boxed equation, we get 
dual g = lim [/x(/ -f P -j- • • • -f P^)]. 

n-» 00 

For simplicity in notation we shall adopt the convention that dual g, 
and not g, is the potential of fx. We can at last formulate a definition. 

Definition 7-6: Any row vector fx with /x'i well defined and finite for 
which the limit 

V = lim [ix{I -h P + • • • + P^)] 

n-*’ oo 

exists and is finite-valued is called a left charge with potential measure v. 

The condition that fx^ be well defined and finite is the analog of the 
condition that a charge in be the difference of two finite measures. 
The boxed equation for g yields an alternate possibility, namely 

g = lim [(/ -I- P -f • • • -f P^)(dual ^)] 

n-* cjo 

with a(dual fx) = |x^ finite. If we had gone through the same argument 
for the process P, we would have obtained the same equation with the 
carets removed. We therefore complement Definition 7-6 as follows. 

Definition 7-7: Any column vector / with af well defined and finite 
for which the limit 

g = lim [(/ + P + . . . + P-)f] 

n-*’ 00 

exists and is finite- valued is called a right charge with potential function g. 

From our knowledge of what happens with Brownian motion, we 
should expect that the Markov chain potential operators will arise in 
different ways in the transient and recurrent cases. Consequently, we 
shall treat the different kinds of processes separately, handling the 
transient case in Chapter 8 and the recurrent case in Chapter 9. 

In the transient case of Brownian motion the operator was formally 

CT I'oo 

lim P^dt = PHt, 

Jo Jo 

and it is no surprise that for Markov chains it is the matrix N = 
2®=o which turns out to be the potential operator both for left 
charges and right charges (see Theorem 8-3). Once we have the 
potential operator, it will not be difficult to develop a full theory in 
analogy with classical potential theory. 

In the recurrent case, however, the problem of finding the potential 
operator is not so easy. The information that we will find, just as in 




184 



IntrodiLction to potential theory 



Proposition 7-5, is that for any charge of total charge zero on which the 
potential operator is defined the potential operator should agree with 
the operator which is formally 




It will turn out that there are many possible potential operators for left 
charges and many others for right charges. Of these the matrices 
— (7 and —G, respectively, will be representative (see Definition 9-24). 
But if we ask that the same matrix work for both left and right charges, 
then we shall see that there is a matrix K such that all such potential 
operators are of the form — iC + c1 a, where c is a constant (see Theorem 
9-84). With — iC as our operator, we have some hope of imitating 
classical potential theory if we redefine charge and potential in terms of 
K\ The column vector /is a charge, for instance, with potential g if 

g = -Kf. 

From this new definition of charges and potentials, we shall be able, just 
as in the transient case, to prove theorems which are analogs of some of 
the main results of classical potential theory. 

In discussing the relation of Brownian motion to potential theory in 
Section 3, we mentioned that the operator inverse to — A, where 

Af = lim y (0‘/ - /), 

was of the same form as the potential operator. It is thus quite 
believable that —A should be essentially the inverse operator that 
transforms potentials into charges. Now the definition of A involves a 
derivative, and when concepts in Brownian motion are translated into 
concepts in Markov chains, derivatives transform into differences. 
Therefore, the proper analog of Af for Markov chains is Pf — f = 
{P — I)f. That is, 7 — P plays the role of — In Theorems 8-4 
and 9-15 we shall see that 7 — P is indeed the operator that transforms 
potentials in the sense of Definitions 7-6 and 7-7 into charges. 

With Brownian motion the operator 4 is a constant multiple of the 
Laplacian operator A for smooth enough functions, where 




If a function / satisfies the equation J/ = 0 in a neighborhood of a 
point X, then / is said to be harmonic in a neighborhood of x. The 
analog in the case of denumerable Markov chains is that if a column 
vector / satisfies (P — 7)/ = 0 at the point i, then / is regular at i. 
Thus we can expect that regular functions will have some of the same 
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behavior for Markov chains that harmonic functions have classically. 
As an example, a function harmonic on a connected open set in 
cannot assume its maximum value inside that open set unless the 
function is constant. We shall see in Corollary 8-44 that an analogous 
result holds for Markov chains. 

A twice continuously differentiable function / is said to be super- 
harmonic a Af < 0. The analog of this property is the condition that 
(P — /)/ < 0 or/ > P/. Thus the analog of a superharmonic function 
is a superregular function. 

Table 7-2. Markov Chain Analogs of Potential Theory 

Concepts 



Classical Notion 


Markov Chain Analog 


R^ 


State space S 


and 


pn 


Lebesgue measure 


a 


Potential 


lim |a(/ + P + • • • + P") 
or lim (/ + P H 1- P")/ 


Total charge 


or af 


Potential operator 


Transient: N 
Recurrent: —K 


Inverse operator 


I - P 


Harmonic function 


Regular function 


Superharmonic function 


Superregular function 


Connected set 


Communicating set 



6. Brownian motion as a limit of the symmetric random walk 

The symmetric random walk in n dimensions was defined in Chapter 4 
as a Markov chain obtained from sums of independent random variables 
on the integer lattice in R^ with the probability of going from any state 
to any of the 2n neighboring states equal to l/(2n). In potential 
theory for Markov chains this process assumes the role of the “classical 
case,” exhibiting in its potential theory much of the special behavior of 
the theory in Section 2. For instance, the matrix of the potential 
operator for this process has the same asymptotic behavior at infinity 
as the potential kernel has classically: log |o:| in two dimensions, 
lj\x\ in three dimensions, and so on. 

The reason for this coincidence is that Brownian motion is in a 
precise sense the limit of the symmetric random walk. Specifically if 
the random walk is considered first on the integer lattice, then on the 
half-integer lattice, then on the quarter-integer lattice, and so on, then 
the probabilities in the fcth process of being in a fixed ball in R^ after 
time converge to the probability in Brownian motion of being in 
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that ball after time t. We shall prove this result only in the one- 
dimensional case, and we make use of the Central Limit Theorem 
(Theorem 1-68). 

Consider for fixed h the random walk on the line having as states the 
points of the form j2“^, for J any integer, and having transition prob- 
abilities J from any state to each of its two neighbors. This process is 
the symmetric random walk with a change in scale. Let be the 
nth outcome function, and let = 0. As in Section 1, we let 
denote the position in Brownian motion. 

Proposition 7-8: Brownian motion in one dimension is a limit of the 
symmetric random walk in this sense : If ^ is a diadic rational and if a 
and j8 are real numbers, then 

lim Fro[xf^\ e (a, ^)] = Pro[a;, 6 (a, j3)]. 

k-* 00 



Proof: The random variables are independent and 

identically distributed and have mean 0 and variance 

2 _ 1 / 1\2 1 / 1 \2 1 

2 1^27 2 ( 27 “ 4'^* 

Let m = 4^t be an integer. Since 

m 

n= 1 

has mean 0 and variance m/4^ = t. Hence, by the Central Limit 
Theorem, 



lim Pro 



VI Vt VI 



I - - iv ) 



or 



lim Pr.[. < x«> < H] - 0{±^ - 

On the other hand, by definition. 



Pro[x^ e (a, ^)] = J 

-i 



V2nt 

I 



-y^l2tdy 



ai^t a / 2tt 



lim Fvo[x% e («. ^)] = Pro[ai( e (oc, )3)]. 



Therefore, 




187 



7-8 Symmetric random walk in n dimensions 

7. Symmetric random walk in n dimensions 

As was mentioned in Section 6, the n-dimensional symmetric random 
walk and n-dimensional Brownian motion share a number of properties 
because the second process is the limit of the first. Of these we shall 
prove for the random walk just two — that the process is recurrent in 
dimensions 1 and 2 and transient in dimension n > 3 and that in the 
transient case the columns of the A-matrix tend to zero. The second 
result is in analogy with the behavior of the potential kernel 
in Brownian motion. 

For the first problem we note immediately that all states communi- 
cate in the random walks of all dimensions; hence each of them is 
either transient or recurrent. In one dimension the state space is the 
integers, and 

Since the mean step in zero, the process is recurrent by Proposition 
5-22. A more direct proof of the recurrence proceeds as follows. It is 
impossible to get from state 2 to state 0 without going through 1. 
This fact, together with the translation invariance of the hitting 
probabilities, implies that 

H20 = ^21^10 — 

But 

-^00 = + i^-1,0 = i^io + h^oi = ^10 

since Hq^ = H^q. Therefore, the identity 

^10 = + i^20 = i 

implies that = 1 and hence ^00 = 1- Consequently, the process 
is recurrent. Still a third proof can be based on a calculation of Nqq. 
In fact, we have 




because in order for the process to return in 2n steps to 0, it must make 
n steps to the right and n to the left; each such possibility has prob- 
ability 2"^^. By Lemma 1-59, 




Hence the tail of the series Nqq = 2 ^ 00 ”^^ dominates a constant multiple 
of the tail of the series 2 ^iVn, Therefore, Nqq is infinite and the 
process is recurrent. 
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In the two-dimensional case there are two simultaneous processes 
going on in perpendicular directions, and for the process to return to 
the origin, both of them must return to their zero positions at the same 
time. Letting k be the number of steps to the right and n — k the 
number of steps up in 2n steps, we have 

p( 2 n) = 4-2n V / V 

\A;, k,n — k,n — kj 

If we multiply numerator and denominator of the multinomial co- 

I 

. Thus 



efficient by (nl)^y 


we see that it equals j j 






The identity 








.1. (9 


< = C) 


shows that 








p(2n) _ 
-^00 — 


‘■“(It 


Since 




c22^ 

V n 




n ~ 


we have 




- c^i 

n 




p(2n) 

^00 



Thus the series iV^oo = 2 dominates a multiple of 2 l/^» ^oo 
must be infinite, and the process is recurrent. 

An alternate proof that the process is recurrent in two dimensions 
proceeds as follows. If we introduce the new coordinates 

u = X y 
V — X — y, 

then the two-dimensional symmetric random walk described in the 
coordinates {u, v) executes two one-dimensional symmetric random 
walks independently of each other. Hence P^o!o\(o.o) is the probability 
that 24 = 0 and 2 ; = 0 after 2n steps, which is 



In three dimensions we calculate iV 



00 - 



P(q2JI) = 6-2^ 



We have 
2n 

j - k,n 



y ( 

tk 

j + k<n 
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The coefficients ( . , ^ . ,1 are dominated by the central term 

\j,lc,n - j - kj 

(w/3, njZ, n/s)’ 

where the gamma function may be used for (n/3)! if 3 does not divide n. 

This fact and the observation that the coefficients | ^ . , | 

\j,k,n - j - k) 

sum to 3^ implies that 

p(2n) < ^ V 

^ U/ U/3, W3, W3/ 



Summing on n and using the approximations of Section l-6a, we see 
that the series Nqq = 2 is dominated by a multiple of 2 
Therefore Nqq is finite, and the process is transient. 

If any higher-dimensional random walk were recurrent, then the 
process projected to a three-dimensional set would be recurrent, and 
the latter process watched only when it changes state would also be 
recurrent. But this last process is exactly the three-dimensional 
symmetric walk. We conclude therefore that the random walk in all 
dimensions greater than three is transient, and we have completed the 
proof of the following proposition. 



Proposition 7-9: The symmetric random walk is recurrent in dimen- 
sions one and two and is transient in all dimensions greater than or 
equal to three. 

In the transient case of dimension n > 3, the jth entry in the 0th 
column of the iV-matrix is of the order of a constant times 
We conclude this chapter by proving the weaker result that the entries 
of that column tend to zero, but our proof will be for a more general 
situation. 



Proposition 7-10: Let P be a Markov chain with an infinite state 
space such that 

(1) P is transient. 

(2) P = pr. 

(3) P has only finitely many non-zero entries in each row. 

Then 

lim Hqj = 0. 

00 
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If, in addition, P represents sums of independent random variables, 
then 

lim Nqj = 0. 

j-*oo 

In particular, the conclusions apply to the symmetric random walk in 
any dimension n > 3. 



Proof: We note that hypothesis (3) is equivalent to 

(3') For any state 0 and for any given m, there exist only finitely 
many states j for which 0 for some k < m. 

If the conclusion about is false, then for some e > 0 and for 
infinitely many j we have > e. By (1), N is finite- valued. 
Therefore, 

lim P^N = lim [iV -(/+■.. -f P^~^)] = 0 

00 k-* 00 

and 

lim P'^H = 0 

k~* 00 



since H < H < N. Choose m large enough so that (P^H)qq < e^. 
Since there are infinitely many j such that Hqj > e, we can find by (3') 
such a j with = 0 for all A: < m. Then 



Pro[hit 0 after time m] > Pro[ever hit j and return to 0] 






= HojHoi by (2) 



But 



Pro[hit 0 after time m] = (P^H)qq < 



a contradiction. Therefore Hqj — > 0. 

Finally if P represents sums of independent random variables, 

^Oj — — ^ 0 ;^ 00 - 

Hence Nqj 0. (Note that we really need only that Njj- is bounded.) 

As Markov chains the symmetric random walks have some special 
properties, reflecting corresponding special properties of Brownian 
motion. For instance, a is a constant for the random walk, and 
P — P^. Consequently P = P. We shall see that although many 
results of classical potential theory generalize to all transient and most 
recurrent chains, some will require further assumptions which happen 
to be true for symmetric random walks. 





CHAPTER 8 



TRANSIENT POTENTIAL THEORY 



1. Potentials 

In this chapter P denotes a Markov chain all of whose states are 
transient — that is, a transient chain with no absorbing states. Every 
such chain has at least one (strictly) positive superregular measure, as 
we saw in Chapter 5; for example, the sum of times the ith row of 
N is such a measure. 

We select one such positive superregular measure, to be fixed 
throughout the chapter, and call it a. All of transient potential theory 
will be relative to the distinguished vector a. 

Let P be the a-dual of P. Since all states are transient in P and 
since P = P, we see that P is the most general chain of the type we 
consider. The distinguished measure for P is taken to be the same a. 

As an example, let P be a transient Markov chain whose states 
communicate. Then P1 < 1 and 0 < A = 2?=o P^ < oo. Every 
non-negative non-zero superregular row vector j8 is positive, for if 
j3y ^ 0, then for every state i and integer k 

A > ^ 

m 

The right side must be positive for some k, since j communicates with i. 
Thus in this special case any non-negative superregular row vector may 
be taken as a; in particular, a may be taken as a row of N. _ 

In the general case, if P1 ^ 1 , we have defined the enlarged chain P 
by adding an absorbing state a to P and by setting 

Pij = Pij if i ^ a and j / a 

= 1 - 2 Pi, 

k 

Pai = Ki- 

If P1 = 1 , we shall agree that P is its own enlarged chain. 
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It will be convenient to say that the product of a row vector and a 
column vector is finite when we mean that it is well defined and finite. 
We recall the terminology function for column vector and signed 
measure for row vector. 

Definition 8-1 : If /a is a signed measure with finite and if 
y = lim[/Lt(/ -f P + •••+ P^-")] 

n 

exists and is finite-valued, then is called a left charge with potential 
measure v. If / is a function with af finite and if 

gr = lim[(7 + P + •••+ P^-^)f] 

n 

exists and is finite-valued, then / is called a right charge with potential 
function g. In either case a pure potential is a potential of a non- 
negative charge. 

The condition that af be finite is the natural analog of the classical 
theory as described in Chapter 7. It states that / is integrable with 
respect to the distinguished superregular measure a. Similarly, the 
condition on /x is that the distinguished superregular function 1 be 
integrable with respect to fx. 

Potential functions have a simple probabilistic interpretation in terms 
of games. If / denotes a payment function in which fj is the payment 
a player receives each time he is in state J, then P^f denotes the expected 
payment on the nth step. Thus (I -h P -f • • • + P^“^)/ is the total 
payment before time n, and the potential g is the expected total pay- 
ment in the long run. It is clear intuitively that gi should equal 
2; we now prove this result. 

Lemma 8-2: If is finite, then is finite-valued. If af is finite, 
then Nf is finite- valued. 

Proof: We have 

N„ = 

Thus 

\{fxN)j\ < 2 ^ 2 

i i 

For the second half let fx = dual / and apply the first result to P, 
noting that |x^ = af. Then 

i i 

Since aj ^ 0, Nf is finite-valued. 
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Theorem 8-3: If af is finite, then / is a charge and its potential is 
g = Nf. 

Proof: By Lemma 8-2, Nf is well defined and finite-valued. Hence 



so are 


both Nf'^ and Nf~. By monotone convergence. 






lim [(/ + P + • • • + P" 


= 


Nf* 


and 










lim[(/ + P + • • • + P"' 


^)/-] = 


Nf- 


Thus 










lim [(/ + P + • • • + P"-^)/] = Nf* - 


Nf- = 


Nf 



n 



Thus / is a charge if and only if it is integrable with respect to a, and 
N is the potential operator that transforms a charge into its potential. 
In particular, / is a charge for P if and only if it is a charge for P. 
We shall now show that / - P is the inverse operator. 

Theorem 8-4: If gr is a potential, then (/ — P)g is its charge. 

Proof: Let /be a charge with potential g. By Theorem 8-3, g = Nf. 
Hence, by Lemma 5-9, (/ — P)g = f. 

Therefore, there is a one-to-one correspondence between charges and 
potentials. Note that Theorem 8-4 implies that a potential is regular 
at all states where the charge is zero. 

The method used to derive the second half of Lemma 8-2 is of general 
importance. We prove a result for all our P’s for signed measures 
(or functions). We apply the result to P and obtain a corresponding 
result for functions (or signed measures). Then since P is the form of 
the most general transient chain being considered, the new result holds 
for all P’s. Such results will loosely be described as duals. 

The duals of Theorems 8-3 and 8-4 state that a signed measure /x is a 
charge if and only if is finite. Its potential is = ijlN, and 
/X = v(/ - P). 

From now on we shall prove theorems only for functions; the dual 
results for signed measures can always be proved by the indicated 
method. The key to the success of the method is that the dual of a 
right charge for P is a left charge for P, and the dual of a potential 
function for P is a potential measure for P. 

From Theorem 8-3 we see immediately that the class of potentials is 
quite extensive. We can even prove that there exists a strictly positive 
pure potential, a result we shall need later on. 
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Proposition 8-5: There exists a strictly positive pure potential. 
Proof: Number the states and let/y = 2”^/ay. Then 

«/ = 2 = 1 . 

) 

SO that /is the charge of a pure potential g, by Theorem 8-3. Further- 
more, 

9i — ^ ^ijfi ^ ^iifi ^ /i > 
j 

so that g is strictly positive. 

For many purposes it is sufficient in studying potentials to consider 
only pure potentials. The reason for this simplification is the following. 

Proposition 8-6: Any potential may be represented as the difference 
of two pure potentials. 

Proof: Write g = Nf = A/^ — Nf~. 

Note by Theorem 8-4 that a potential is superregular if and only if 
it is a pure potential. 

We recall from Theorem 5-10 that a non-negative superregular 
function h is uniquely representable as A = Nf -f r with r regular. 
In the representation, r = lim,^ P^h > 0 and f = {I — P)h > 0. 
(The dual of this result allows the unique representation of a non- 
negative superregular measure tt as tt = /xA -f p with p regular. In 
this representation, /o = lim^ ttP^ > 0 and /x = tt(I — P).) This result 
is the analog of a classical theorem due to F. Riesz: In any open set of 
Euclidean space which corresponds to a transient version of Brownian 
motion, any non-negative superharmonic function is uniquely the sum 
of a pure potential for the region and a non-negative harmonic function. 
The pure potential may have infinite total charge. We now generalize 
the Markov chain result, and in so doing we obtain a useful necessary 
condition that potentials must satisfy. 

Proposition 8-7 : If (/ — P)h = f, if h is finite- valued, and if Nf is 
finite- valued, then h has a representation in the form 

h = Nf r 

with r regular. The vector r satisfies r = lim^ P^h. 





8-9 



Potentials 



195 



Proof: Set r = h — Nf. Then 

(/ _ P)(h - Nf) = (/ - P)h -(I - P)(Nf) 

= (I ~ P)h — / by Lemma 5-9 

= /-/= 0 . 

Hence r is regular. Now 

f = (I - P)h = h - Ph 
or 

h = Ph +f. 

Since Nf is finite-valued and since P^f^ < Nf'^ and P^f~ < Nf~, 
P^f is finite- valued for every n. By induction we see that 

pk-ii^ = p^h + P^-^f 

and that P^h is finite-valued. Summing for k = 1, . . . , n, we obtain 

h = P^h -h (/ + P + h P"""^)/. 

By dominated convergence the second term tends to Nf, Hence 

h = lim P^h -f Nf. 

n 

Corollary 8-8: If (/ — P)h = /, if A is finite-valued, and if a/ is finite, 
then is a potential if and only if lim^ P^h = 0. 

Proof: By Theorem 8-3, / is a charge and Nf is its potential. Apply 
Proposition 8-7 and write 

h = Nf + lim P^h. 

If lim P^h = 0, then h is the potential of/. Conversely, if lim P^h i=. 0, 
then h cannot be a potential because, by Theorem 8-4, it would have to 
have / as its charge. 

Corollary 8-9: If ^ is a potential, then lim^ P^g = 0. 

Proof: Take h = g \n Corollary 8-8. 

In the discrete analog of the classical case — three-dimensional 
symmetric random walk — every potential g is bounded and satisfies 
lim^ g^ = 0. In our theory we obtain only the weaker result. Corollary 
8-9; that g may be unbounded will be shown in Section 7. 

The stronger results of the classical theory are due to special features, 
as the next proposition shows. In the classical case a is chosen as 1 ^ 
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and Nn is independent of i. Hence, > kN^ for some positive 
constant k. Furthermore, lim< Nij- = 0 by Proposition 7-10. 



Proposition 8-10: If a is chosen so that > kN^i for all i with k a 
positive constant, then all potentials are bounded. If, in addition, 
lim< Nij = 0, then lim^ = 0. 



Proof: The dual of Ni^ < is N^j < (NJa^)aj, If > kN^^, then 

l?.l ^ 

T 

Thus g is bounded. Now suppose lim^ Nij = 0. By Proposition 8-6 
we may assume that gr is a pure potential. Define a sequence of func- 
tions a measure /xy = «y/y. Then /x is a finite measure, 

and 

hf < — < T 
«y k 

is bounded independently of i and j. Hence, by dominated con- 
vergence, 

lim gr, = lim ^ hffXj = 2 = 0- 



Both conditions of Proposition 8-10 hold for the basic example with a 
chosen as jS. However, only the first condition holds for the reverse of 
the basic example (see Section 6). In Section 7 we shall see an example 
where both conditions fail. 



2. The /i-process and some applications 

Duality is a transformation which interchanges the roles of row and 
column vectors. Our purpose now is to describe a useful transformation 
of transient chains into new transient chains in which row and column 
vectors are transformed into vectors of the same type. 

Definition 8-11: Let A be a positive finite- valued superregular 
function for a transient chain P. The A-process is a Markov chain P* 
with transition probabilities 

p* __ 

hi ’ 

It is left to the reader to verify that P* is a transition matrix, that all 
states are transient, and that if the states of P communicate, the same 
is true for P*. Let C7 be a diagonal matrix with diagonal entries 
l/hi. Then P* = UPU~\ 
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Definition 8-12: The A-process transformation is a transformation 
defined on square matrices F, row vectors tt, and column vectors / by 

y* = UYU-^ 

7T* = TtU-^ 

r = uf. 

The A-process transformation yields results similar to those with 
duality. If P is a transient chain, then P* also is transient. More- 
over, powers of P transform to the P* process the same way that P 
does, and the fundamental matrix for P* is N*. Sums and products 
are preserved in their given order; and equalities, inequalities, and 
limits are preserved entry-by-entry. Any superregular function (or 
signed measure) for P transforms into a superregular function (or signed 
measure) for P*. 

If a is the distinguished superregular measure for P, we select a* as 
the distinguished superregular measure for P*. Then a*/* = af and 
_ (iV/)*. Hence if/ is a right charge with potential g in P, then 
/* is a right charge for P* with potential gr*. 

If we decompose P as 

E E 
E /T U\ 

S\R q\ 

then 

^ \ 

From this decomposition we see that (B^)* is the P^-matrix of the P*- 
chain because Q and R transform into Q* and P*, because products are 
preserved, and because 7* = 7. 

We shall now give some applications of the A-process. 

Definition 8-13: The support of a charge is the set on which the charge 
is not 0; the support of a potential is the support of its charge. A 
charge or potential is said to have support in E if its support is a subset 
of P. 

The function 1 is always superregular, and hence by the representa- 
tion theorem 1 = Nf -|- r, where f = {I — P)1 and r is regular. That 
is, 

/, = 1 - (Pi), = 
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in the enlarged chain. Moreover, 

{Nf\ = = 5ia- 

j 

Thus = 1 — is the probability that the P-process, started at i, 
continues indefinitely. The enlarged chain is absorbing if and only if 

1 = Nf, 

Proposition 8-14: Let ^ > 0 be a pure potential with support in E, 
and let P* be the A-process for h = g. Then ^* = 1 is a potential, P* 
is absorbing, and (P^)*1 = 1. 

Proof: Potentials transform into potentials; hence gr* = 1 is a 
potential. Since 1 is then of the form P* is absorbing. The 

absorbing state a can be reached only from a state i such that (P*1 )i < 1 ; 
for such a state i, 

0 < [(/ - P*)1], = [(/ - P*)g*l = f*. 

Hence /j > 0 and i must be in E. Thus the P*-process with probability 
one reaches E from all states, and (P^)*1 = 1 . 

What underlies Proposition 8-14 is this: The ^-process tends to 
follow paths along which h is large. But since potentials tend to zero 
on the average {P^h 0 for a potential), if A is a potential, then the 
paths in the ^-process disappear. See Chapter 10 for details. 

Proposition 8-15: If A is a non-negative finite- valued superregular 
function, then B^h < h for any set E. 

Proof: First suppose that A > 0. Form the A-process; then 
/i* = 1. Since (P^)*1 < 1, we have B^h < h. (The conclusion that 
an inequality for the A-process implies an inequality for the original 
process is one we shall draw frequently. If it were false, then the 
inequality B^h < h would fail in some entry. But the ^-process 
transformation preserves inequalities entry-by-entry.) 

Now suppose that h has some zero entries. Apply the special case 
above to the function A -|- e1 . Then 

B^{h + e1) < ^ -f €1. 

Letting e tend to zero, we obtain B^h < h. 

Proposition 8-16: If is a non-negative superregular function and if 
E is any set of states, then h = B^h satisfies the following: 

(1) h < h and = h^. 
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(2) h is the pointwise infimum of all non-negative superregular 
functions which dominate h on E, 

(3) If / = (/ - P)A, then 

/e = (I — ^ 

and 

A = 0. 

Therefore, h is regular on E and is superregular everywhere. 

(4) li E C F, then B^h < B^h. 

Proof: Statement (1) follows from Proposition 8-15 and the fact that 
Bfj = for i in E, For (2) let be a non-negative superregular 
function such that > h^. Since the E columns of B^ are zero, 

X > B^x > B^h — h, 

and (2) holds provided we show in (3) that h is superregular. For (3), 
since Ph < Ph < his finite-valued, 

jl — P^ ^\I^E 

f = (I - P){B^h) = [(I - P)B^]h = 

\ 0 o/U^ 

But h^ is P^-superregular by the dual of Lemma 6-7, and (3) follows. 
Finally for (4), if P C P, then by conclusion (6) of Proposition 5-8, 

B^h = B^(B^h) < B^h, 

We now prove two lemmas and a proposition which conclude that a 
charge and its potential may both be computed from a knowledge of 
the values of the potential on the support. The first lemma is interest- 
ing in itself because of its game interpretation, which we shall discuss 
after proving the result. 

Lemma 8-17 : For any set of states P, 

N = B^N -f ^N, 

If gr is a potential with charge /, then 

g = B^g + ^Nf. 



Proof : In Theorem 4-11, let fy be the number of times in j when and 
after P is reached (or 0 if P is not reached), and let t be the time when 
P is reached (or -f oo if P is not reached). Then Theorem 4-11 yields 

= 2 = k] M,[n,] 

k 

k 
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But fy is the difference of the total number of times the process is in j 
and the number of times it is in j before reaching E. Hence 

and the first equation follows. To get the second equation we multiply 
through by /; associativity in B^Nf holds because B^N\f\ is finite- 
valued. 

In the game interpretation of potentials, /y is a payment received 
each time the process is in state j, and g is the expected total gain if the 
process is started in i. The second equation of Lemma 8-17 states that 
g is the expected gain when and after E is entered plus the expected 
gain before E is entered. If all the payments are non-negative, then it 
is obvious from this interpretation that g > B^g. If the support of / 
is in E, then all non-zero payments occur in E, and the expected gain 
before reaching E is zero. Hence, as we shall see formally in Proposition 
8-19, gr = B^g. 



Lemma 8-18: The fundamental matrix for is 

Proof: The assertion is probabilistically clear because the number of 
times the process P is in a state of E when watched only in states of E 
is the same as the number of times the process P is in a state of E. 

Proposition 8-19: If gr is a potential with support in P, then g^ 
determines g, g = B^g, = (I - P^)gE, and gE = Ne/e- 



Proof: The fact that g = B^g is immediate from Lemma 8-17. 
Hence gE determines g. Since g = B^g, we have fE = (^~ by 

conclusion (3) of Proposition 8-16. Finally gE = ^e/e either by Lemma 
8-18 or by direct calculation: 

/ I^E ^2\/fE\ I^e/eX 

U'Ua nJ\oI~\NJeI 

Next we shall prove that the columns of B^ are always potentials. 

Proposition 8-20: For any set of states E the columns of B^ are 
potentials with support in P, and 

// - P^ 0\ 



B^ = N\ 



0 
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Proof: Since 



// - 0 \ 

«| 1 = {a^{I - P^) 0) 



0 



0 , 



is finite-valued, each column of N 



support in E. Thus by Proposition 8-19, 



I - P® 0 



N 



I - P® 0 



= 



I - P® 0' 



= 



= P® 



= P^. 



0 0 , 
■N^(I - P^) 0 
N^{I - P^) 0 , 
/ 0 
N^{I - P®) 0, 



is a potential with 



INe n. 

with N = ( 

Us ^4, 

by Lemma 8-18 



Corollary 8-21 : For any set E, lim^ = 0. 



Proof: Apply Proposition 8-20 and Corollary 8-9. 

Finally we work toward a proof that a non-negative superregular 
function dominated by a potential is a potential, a result we state as 
Proposition 8-25. 

Lemma 8-22 : If ^ is a finite set and if h is non-negative superregular, 
then B^h is a pure potential of finite support. 

Proof: B^h is a finite linear combination of columns of B^ and is 
therefore by Proposition 8-20 a potential with support in the finite set 
E. Since h is non-negative superregular, B^h is non-negative super- 
regular by conclusion (3) of Proposition 8-16. Hence, B^h is a pure 
potential. 

Proposition 8-23: Every non-negative superregular function is the 
limit of an increasing sequence of pure potentials of finite support. 

Proof: Let E^ C E 2 C E^ C • • • he dt,n increasing sequence of finite 
sets with union the set of all states S, and let = B^nh, Then 
is an increasing sequence of pure potentials of finite support by 




202 



Transient potential theory 



Lemma 8-22 and conclusion (4) of Proposition 8-16. If i is in 
then so that lim A^^^ = A. 

If gr is a potential with charge /, then the total charge of g (or /) is 
defined to be af (see Section 7-5). 

Lemma 8-24: If Nj < Nf with / > 0 and af finite, then 

0 < af < af < 00. 

Proof: By the dual of Proposition 8-23, we may find a sequence of 
finite measures such that a is the monotone limit of Since 

Nf < Nf, we have 7T^^\Nf) < Tr^^'^(Nf) and 

Um Tr^^\Nf) < lim 7T^^\Nf). 

_ n n 

Since / > 0, 

n<-\Nf) = (7r‘”W)/, 

and 

lim 7r^^\Nf) = (lim = af 

n n 

by monotone convergence with/ as the measure. And since rr^^^Nf < 
af'^Kcc andTT^^W/" < ccf~ < oo, 

rr^^\Nf) = - TT^^W/-. 

Hence 

lim 7T^^\Nf) — af^ ~ 

n 

by monotone convergence for each term. Thus af < af] af > 0 since 
/ > 0, and af< oo by hypothesis. 

Proposition 8-25: If A is a non-negative superregular function 
dominated by a potential g, then A is a potential and its total charge is 
no greater than the total charge of g. 

Proof: Let g = Nf. Write h = Nf lim P^h with / > 0. Since 
0 < h < g, we have 0 < P^h < P^g. But P^g 0 by Corollary 8-9, 
so that P^h 0 and A = Nf. Since a|/| < oo, we have, by Lemma 
8-24; 0 < a/ < a/ < 00. Hence A is a potential and af < af. 

Corollary 8-26: A non-negative potential g = Nf has non-negative 
total charge. 

Proof: Let ^ = Nf > 0, and set/ = 0. Since a/ is finite, o/ > af = 0 
by Lemma 8-24. 
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Proposition 8-25 has an interesting interpretation in terms of the 
enlarged chain. The jth column of iV is a potential with charge 
{Sij}. If the infimum of the column is positive, then the column 
dominates a constant function k ^ , so that 1 is a potential by Proposition 
8-25. Hence, the infimum of every column of N is zero unless the 
extended chain is absorbing. In particular, if P^ =1, then the 
infimum of every column of N is 0. 

In the case of the symmetric random walk in three dimensions, 
P1 = 1 . Thus the infimum of every column of N is zero, and since P 
is symmetric, the infimum of every row is zero. This fact, although 
not providing a proof of Proposition 7-10, does give us more insight into 
that result. 

3. Equilibrium sets and capacities 

In proving analogs in the next section to the classical potential 
principles, we shall need to restrict the supports of the charges involved. 
The notion we shall need is that of an equilibrium set. 

Definition 8-27 : A set P is an equilibrium set for P if there is a pure 
potential which assumes the value 1 at every point of E and which has 
support in E. Such a potential is called an equilibrium potential for E. 
A set P is a dual equilibrium set for P if there is a pure potential measure 
with support in E which equals a on E. 

We proceed to give two characterizations of equilibrium potentials. 

Proposition 8-28: A set E is an equilibrium set if and only if both 

(1) ae^ < 00 and 

(2) for any starting distribution the set E is entered only finitely 
often a.e. 

When E is an equilibrium set, the hitting vector is the unique 
equilibrium potential and its charge is the escape vector 

Proof: Suppose E is an equilibrium set. If x is an equilibrium 
potential for E, then B^x = a: by Proposition 8-19 and x^ = ^\ by 
definition. Since xg does not affect the value of B^x, we have 

X = B^x = B^^ = h^, 

and must be the equilibrium potential. Its charge is 

(I - P)h^ = e^ 
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by Theorem 8-4 and Proposition 5-8. Thus E is an equilibrium set if 
and only if is a potential or if and only if ae^ < oo and lim P^hJ^ = 0. 
But = lim by conclusion (7) of Proposition 5-8. 

Corollary 8-29: All finite sets are equilibrium sets. 

Proof: Apply Propositions 4-28 and 8-28. 

Proposition 8-30: If E is an equilibrium set, then the equilibrium 
potential is the pointwise infimum of all pure potentials which dominate 
1 on E. 

Proof: Since the result follows from conclusion (2) of 

Proposition 8-16. 

We shall use the notation rj^ for dual e^. 

Definition 8-31 : If is an equilibrium set, the capacity of E is defined 
by C{E) = ae^ = . 

In terms of total charge. Definition 8-31 states that the capacity of 
an equilibrium set is to be the total charge of the equilibrium potential. 

Lemma 8-32: A set E is an equilibrium set if and only if both 
(p£)ni Q a^lil — P^)1] < 00 . If E is an equilibrium set, then 
C(E) = a,[{I - P^)1] = K(/ - P^)]1. 

Proof: We shall apply Proposition 8-28. [(P^)M is the probability 

starting in i g P of returning to E at least n times. Thus (P^)^1 -> 0 
is a necessary and sufficient condition for being in E only finitely often 
a.e. for any starting distribution. Secondly (I — P^)1 = ef and 
= ae^. Hence ae^ is finite if and only if — P^)1] < oo. 

And if E is an equilibrium set, then 

C{E) = a,[{I - P^)1]. 

Under duality a number is transformed into itself. Hence 
C{E) = [(dual 1 )(dual (I — P^))](dual a^) 

= K(/ - P^)]1. 



Proposition 8-33 : P is an equilibrium set if and only if 1 is a potential 
for P^ with as the distinguished measure. Also C{E) is the same 
computed for P as for P^. 
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Proof: 1 is always superregular for P^. The two conditions given 
by Lemma 8-32 are precisely the conditions that 1 be a potential. Also 
cc£;[( 7 - P^)1 ] is the capacity of E in P^. 

Proposition 8-34: E is an equilibrium set for P if and only if it is a 
dual equilibrium set for P. When E is such a set, 

Proof: E is an equilibrium set for P if and only if 1 is a potential for 
P^ with 1 = ^e^e' duality, this condition is equivalent to the 
assertion that is a potential measure for P^ with = Ve^e- The 
result then follows from the dual of Proposition 8-33. 

We would like the result that C{E) = C(E). However, an equilibrium 
set for P need not be an equilibrium set for P (see Section 6). There- 
fore, the following is the best possible result: 

Proposition 8-35: If E is an equilibrium set for both P and P, then 
C(E) = C(E). 

Proof: By Proposition 8-34, we have = yjs^E^ so that rj§ = 
a^il — P^) by Lemma 8-18. By Lemma 8-32 applied to P, 

C{E) = cce^e ~ (Ve^ e)^e ~ Ve(^ e^e) ~ Ve^e 
= = [cce(I - P^)]^ = C(E). 

Proposition 8-36: If P is a dual equilibrium set and P C P, then E is 
a dual equilibrium set and C{E) = rj^h^. 

Proof: We shall use Proposition 8-28 to prove that P is a dual 
equilibrium set. By Proposition 8-34, ap — (rj^N)p and C{F) = 
< 00 . Hence = (7]^N)e and 




by Proposition 8-20. Then 

ae^ = aplil - P^)1] = [a^(/ - P^)]1 = 

Since rj^h^ < < oo, we have just verified the first condition of 

Proposition 8-28 — that ae^ < oo. The second condition is trivial for a 
subset of an equilibrium set. Hence P is a dual equilibrium set and 
C(E) = ae^ = 
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The dual of this result states that any subset E of an equilibrium set 
F is an equilibrium set, and C{E) — 

Proposition 8-37 : The union of a finite number of equilibrium sets is 
an equilibrium set. 

Proof: Let E^, . . E^^ be equilibrium sets and let E = Ufc = i 

Then 

2 «ief ^22 

ieE fc = 1 ieEic 

n 

^22 

/c= 1 ieEfc 

= 2 

k = l 

and if the process is in each Ej^ only finitely often a.e., then it is in E 
only finitely often a.e. Hence, by Proposition 8-28, E is an equilibrium 
set. 

Some of the classical results hold only if the support of a potential is 
a reasonably small set. It will always be satisfactory to have a finite 
set as support. A more general assumption is that the support is an 
equilibrium set. Since equilibrium sets include all finite sets, since a 
subset of an equilibrium set is an equilibrium set, and since finite 
unions of equilibrium sets are equilibrium sets, we may think of 
equilibrium sets as a class of “reasonably small” sets. 

Choquet has introduced a generaliz,ed notion of capacity. In our 
case his definition takes the following form. 

Definition 8-38: A Choquet capacity is a non-negative monotone 
increasing set function such that, for any sets A2, • . . , 

C{A,nA,r^■■■r^A„) < - 2 C(A,yjAj) 

i j 

+ 2 C(AiUAjDA^) 

j k 

j„). 

A simple way of constructing one of these capacities is to let 77 be a 
fixed starting distribution and to take C(E) to be the probability of ever 
entering E. That is, C(E) = Trh^. This set function is monotone 
because is. The right side of the inequality in the definition of 
capacity is the probability that all sets are entered. The left side is the 
probability that the intersection of the sets is entered, which is one way 
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of entering all sets, though in general not the only way. Hence 
Choquet’s definition is satisfied. Since we may clearly also replace 
C{E) by kC{E) with A: > 0, tt may be any non-negative finite measure. 

We shall show from this construction that our Definition 8-31 yields 
a Choquet capacity on equilibrium sets. For convenience, we will give 
a proof for P. For any fixed equilibrium set F of P, Proposition 8-36 
tells us that the capacity of any subset E is Hence the above 

argument applies with tt = 7 )^. Thus the Choquet conditions hold for 
all subsets of F and, since F is any equilibrium set, they hold for all 
equilibrium sets. 

A more general method of obtaining a Choquet capacity within our 
framework is as follows. Let tt be a measure, and let A be a strictly 
positive superregular function such that nh < 00. Define C(E) = 
Forming the A-process, we see that C(P) = 7 t*(P ^)*1 = 77 *(A^)* 
with 77*1 = Trh < 00. Thus by the special case C(E) is a Choquet 
capacity for the A-process and hence satisfies the same axioms in the 
original chain. Moreover, we see that the situation in the earlier case 
is just the present case with h = ^. On the other hand, this more 
general method includes a second interesting case: If gr is a pure 
potential for which ng is finite, then by Lemma 8-17, 

nB^g = ng - IT ^Nf; 

TvB^g is a Choquet capacity which assumes its maximum value rrg on all 
sets E containing the support of /. If 77I = 1 , then irB^g in the game 
interpretation is the expected gain when and after E is reached. 

Definition 8-31 is reasonable only for equilibrium sets, since otherwise 
it is possible to have = 0 . We could instead restrict the definition 
to finite sets and define the capacity of an infinite set as the supremum 
of the capacities of its finite subsets. We will show, under an additional 
assumption, that this new approach agrees with Definition 8-31 on 
equilibrium sets and assigns infinite capacity to all other sets. 

Proposition 8 - 39 : If P has columns which tend to zero, then a set E 
for which the supremum of the capacities of its finite subsets is finite is 
an equilibrium set, and the supremum is the capacity of the set. 

Proof: Let E-^ C E2 C . . . be an increasing sequence of finite sets 
whose union is E. We must prove that if sup C(Ej^) is finite, then 
is a potential and C{E) = sup C(E^). First we note that is the 
monotone limit of h^n. If i e then the ith component of e^n de- 
creases for n > m. Thus lim = e exists. Since N has columns 

that tend to zero and since = C{E^) < sup C(E^) < 00 , 

-> (dual e)^ 




208 



Transient potential theory 



by Proposition 1-58. By duality, 



h^n = Ne^n -> Ne, 



and 



ae = (dual e)1 = ^ ^f'‘» 

i w 

which by Fatou’s Theorem is 

< lim = lim C(En) = sup(7(^^,j). 



Hence = Ne and ae < oo. Thus E is an equilibrium set. Also 
C(E) = ae < supC(JE/„). 

But C(En) < C(E) for every n, so that sup C{E^) < C(E). Thus 

C(E) = suvC(E^), 



The converse, that an equilibrium set has the property that the 
supremum of the capacities of its finite subsets is finite, follows trivially 
from the monotonicity and the finiteness of capacity on equilibrium 
sets. 



4. Potential principles 

We shall now derive analogs of several of the fundamental theorems 
of classical potential theory. The first is the solution to the Dirichlet 
problem; in the uniqueness statement we shall need a lemma, for which 
we shall give two proofs. 

Lemma 8-40: If P is an absorbing chain, then P has no bounded 
non-zero regular function. 

Proof 1: Suppose Ph = h with \h\ < c1. Since 
have I I < P^j\hj\oY \h\ < P\h\. Therefore, |A| < P^\h\ < P^(c1). 
But P^1 is the probability that the process continues at least until time 
n, which tends to zero as n tends to infinity because P is absorbing. 
Hence h — 0. 

Proof 2 : Let a be the absorbing state of P and let t = a be the time 
to absorption; t is a stopping time since P is absorbing, li h = Ph, 
set 
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Then A'(a:^(a>)) in the P-process forms a bounded martingale. By 
Corollary 3-16 to the second martingale systems theorem, 

0 = a; = M[h\x,(w))] = M[h'(xo(o>))]. 

But Xq(o)) is arbitrary, so that A' = 0 and A = 0. 



The method in the second proof is of some importance, and we shall 
meet it again later. 



Theorem 8-41 : Let E be an arbitrary set of states, and suppose that 
^P, the chain P with E made absorbing, is an absorbing chain. If 
is any bounded function defined on E, then there exists a unique 
bounded function A whose restriction to E is A^ and which is regular 
on E. The function is 




Proof: For existence, set 




The product is defined, since A^ is bounded and has row sums one. 
Then the restriction of A to P is because (P^)iy = 8^ for i and J in E, 
Moreover, 



(I - P)A = (/ - P)B^ 



I - P^ 0\ /A^ 

0 0/1 0 



(/ - P^)hA 
0 /’ 



so that A is regular outside of E\ associativity is justified in the triple 

product because (/ -h P)B^[ I is finite-valued. 

\ 0 / 



For uniqueness, let 



k = 






be another such bounded function. Then A — A is a bounded function 
which is zero on E and regular outside E. If Q is the transition matrix 
for the transient states of ^P, then (A — A)^ is a non-zero bounded Q- 
regular function, in contradiction to Lemma 8-40, since Q is absorbing. 



Next we prove the Maximum Principle. 
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Theorem 8-42 : Let E be an arbitrary set of states and suppose that h 
is a finite-valued function such that h = B^h. Then the supremum of 
the values of h is equal to the supremum of the values of h on E. If 
the states of £ communicate in if ^P is absorbing, and if h assumes 
its maximum on E, then h is constant on E. 

Remark: Corresponding results hold for infima'by replacing A by —h. 

Proof: 

= 2 

jeE 

< 2 Bfj (sup h,\ < sup hj. 

jeE ' ysjE / j€E 

Suppose that h assumes its maximum on E, that ^P is absorbing, and 
that the states of E communicate in ^P. Let i be a state where the 
maximum is assumed, and let k be any state of E that can be reached 
in ^P from E. Since the transient states of ^P communicate, we have 
> 0. Moreover, 

= ^ik^k + 2 

ji^k 

< BIK + M since hj < 

ji^k 

= PfA + ^i(l - Bf^) since P^1 = 1 
= hi — Bfj^(hi — h}^). 

Therefore, for all such k. Then for any m e £, B^j- > 0 

precisely for those j for which Bfj > 0, and hj = h^ for those j. Thus 

h^= 2 Bl,h, = 2 Bi,h, = K 

jeE jeE 



Corollary 8-43: If ^ is a potential with support in a finite set, then g 
is bounded. 

Proof: Since g = B^g for any potential, we may apply Theorem 
8-42. The supremum in E is over a finite number of values. 

Corollary 8-44: Let E be an arbitrary set of states, and suppose that 

(1) ^P is an absorbing chain. 

(2) the states of E communicate in ^P. 

(3) every state of E can be reached in ^P from E. 
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If A is a bounded function regular outside of E, then h cannot assume its 
maximum on E unless h is constant everywhere. 



Proof: By Theorem 8-41, h is the unique solution to the Dirichlet 
problem for the function h^. Hence 



h = B^\ 



::i- 



Multiplying through by and applying Proposition 5-8, we have 
B^h = 

By Theorem 8-42, h is constant on E. As shown in the proof of that 
theorem, h assumes the same constant value at every state of E which 
can be reached in from E. 



The result that follows is the Principle of Domination. 

Theorem 8-45: Let A be a finite-valued non-negative superregular 
function, and let g = Nf he a potential. If h dominates g on the 
support of/^, then h dominates g everywhere. If, in addition, A is a 
potential Nf, then af < a/. 



Proof: If ^ is a pure potential supported in E, then g = B^g by 
Proposition 8-19. But by Proposition 8-16, B^g is the pointwise 
infimum of all non-negative superregular functions which dominate g 
on E. Thus the first half is proved if gr is a pure potential. For arbi- 
trary g, write g = Nf^ - Nf~. We have A/+ - Nf~ < h on the 
support of/^, so that 

Nf^ < h + Nf- 

on the support of /^. Applying the special case to the superregular 
function h -f- Nf ~ and the potential Nf , we have 

A/+ < h Nf~ or g < h 
everywhere. Finally, if h = Nf, then 



implies 



Nf - < A(/ + /-) 
a/+ < «(/ + /-) 



by Lemma 8-24. Hence af < af. 
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Next we prove the Principle of Balayage. 

Theorem 8-46: If gr is a pure potential and if E is any set of states, 
then there is a unique pure potential g with support in E such that 
g = g on E. The potential g satisfies g < g everywhere, and its total 
charge does not exceed the total charge of g. 

Proof: For existence, let g = B^g, Then g < g, gs = Qsi 9 
is superregular by conclusions (1) and (3) of Proposition 8-16. By 
Proposition 8-25, g is a potential, and the total charge of g is less than or 
equal to that of g. 

For uniqueness, if h were another such potential, we would have 
g =. B^g = B^h = h 

by Proposition 8-15 and the fact that g^ = g^ = h^- 

If E is any set and if gr is a pure potential, we refer to the potential g 
of Theorem 8-46 as the balayage potential of g on E, 

Corollary 8-47: The balayage potential g = B^g of gr on is the 
pointwise infimum of all pure potentials which dominate g on E. 

Proof: Apply conclusion (2) of Proposition 8-16. 

Corollary 8-48 : The balayage potential of gr on JS? is the supremum of 
all pure potentials with support in E which are dominated by gr on J?. 

Proof: Certainly the balayage potential does have the stated 
property. Thus let g = B^g and let be a potential with support in 
E and with < gs- Then by Proposition 8-19, h = B^h and 

g = B^g > B^h = h. 

If g has support in E, then g itself is the balayage potential of g on E. 
In particular, is the balayage potential of on E for E an equilibrium 
set. 

Next we prove the Principle of Lower Envelope. 

Lemma 8-49: The pointwise infimum of non-negative superregular 
functions is non-negative superregular. 

Proof: It is clearly non-negative. If 
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for all jS, then 

P(inf < Ph^ < hff 

for all j8, so that 

P(inf h^) < inf h^. 

Theorem 8-50: The point wise infimum of pure potentials is a pure 
potential. 

Proof: Apply Lemma 8-49 and Proposition 8-25. 



Finally we prove the Principle of Condensers. We are to think of 
two sets E and F as the two plates of a condenser with a positive charge 
placed on E and a negative charge placed on F in such a way as to 
produce a unit voltage drop. Since in equilibrium there should be a 
uniform voltage on each plate and since the 0-value of voltage is 
arbitrary, we will require that the potential be 1 on P and 0 on F. 
The theorem is proved for an equilibrium set E with finite boundary P, 
that is, a set P that can be entered or left only through the finite 
set P. 



Theorem 8-51 : Let P be an equilibrium set with finite boundary, and 
let F be any disjoint set of states. Then there is a potential g = Nf 
which is 1 on P and 0 on P and which is such that/"^ has support in P, 
/" has support in P, and a/ > 0. 

Proof: Let the probability starting at i that P is reached 

before P. Clearly, ^ is 1 on P and 0 on P. Furthermore, 0 < g < h^. 
Since P^h^ 0 for the equilibrium set P, we have P^g 0. 

Let f = {I — P)g. We are going to apply Corollary 8-8 to conclude 
that gr is a potential with charge /, but to do so we must show that a/ 
is finite. If i is in the complement of P U P, then g^ = {Pg)u hence/ 
has support in P U P. Write 

P P 
P (X Y\ 

pEuF _ I j 

P \z WJ 

Noting that if i e E U F, then (Pg)i is the probability that the next 
entry to P U P is in P, we have 

fs'i - (Pg)i = 1 - {P9)i = 1 - if ieE 
|o - (Pgr)j = if ieF. 
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Hence 

ri - on E, 

fsuF ~ „ 

[ ~Z1 on F. 

Thus f'^ has support in E and /” has support in F, Furthermore 
= 1 except for the boundary of E\ hence has finite support and 
«/*•■< 00 . Moreover 

a/" = a^Z1 = 2 (“F^)y < ^ 
jeE 

since = 0 except when j is on the boundary of E. Thus a|/| < oo. 
Hence gr is a potential. Finally af > 0 by Corollary 8-26. 

5. Energy 

Classically, energy is the integral of the potential with respect to the 
charge, and we shall adopt the obvious analog of this definition. 
Throughout this section we shall write 

/X = dual / 

V = dual g. 

Definition 8-52: If g — Nf is a potential and |/x| N \f\ < oo, then 
its energy is defined to be 1(g) = fxg = vf, and g is said to have finite 
energy. 

If all potentials are bounded, then all of them have finite energy. 
For if ^ = ^1/1 , then 

1^1 -^1/1 = 2 
t 

< sup^((a|/|) 

< 00 . 

In any case a potential of finite support has finite energy. 

We can write energy either purely in terms of the charge or purely in 
terms of the potential: 

1(g) = f,(Nf) = v[(l - P)gl 

Since the dual of a number is the same number, we also have 
I(^) = = [v(/ - ^)]sr. 

If / is a charge for P, then, as noted after Theorem 8-3, / is also a 
charge for P. In the two processes we have 

l(Nf) = ^(Nf) = 

and 

i(i^/) = (,xN)f = 

and the energies are equal since the matrices associate by Corollary 1-5. 
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The expression for I(gr) in Definition 8-52 disguises the fact that energy 
depends only on the values of the potential on the support E, We 
shall derive a simple dependence of 1(g) on 

Proposition 8-53 : If g is a potential of finite energy with support in P, 
then 

I(^) = - P^)9e\ = [y^(I - 

Proof : Since / has support in E, 



{I ~ P^)9e — /e 

by Proposition 8-19, and 

ve[{I - P^)9e] = ^eJe = k/* = I(^). 

The other half of the proposition is the dual of the first half. 

Classically energy is non-negative. We shall prove shortly that the 
energy of a potential is non-negative provided it is finite. To do so, 
we first introduce a definition. If g = Nf and g = Nf are potentials 
of finite energy, we define 

(S', g) = hiM + MS'). 

provided the matrix products are well defined. (We shall show soon 
that this condition is always satisfied.) 

Note that (g, g) = 1(g). We wish to show that (g, g) is an inner 
product. The reader should verify that (g, g) satisfies (1), (2), and (4) 
in general and (3) when all the potentials have finite support. We 
shall prove (5) and the general case of (3) below. 

(1) (S', g) = (^. S')- 

(2) For every real number c, (eg, g) = c(g, g). 

(3) (S' + S'', g) = (S', g) + (S'', g)- 

(4) If g is a pure potential for which (g, g) = 0, then g = 0. 

(fi) {g^g) ^ 0 for all g. 

Lemma 8-54: If g has support in a finite set E, then 

I(S') = i 2 + 2 ^iPfAgi - 9if\ ^ 0, 

i€E L J 

TTlj = 1 ~ 2 Pfi — ^ Q^nd TTf = oc^ — 2 ^kPki — 
jeE keE 



where 
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Proof: We shall apply Proposition 8-53. The matrices involved are 
finite matrices so that distributivity and associativity hold. Hence 

1 ( 9 ^) = ~ ^eP^9e 

= 5 2 Ui9i + “(S'? + 2 

ieE L jeE J 

= i 2 “ 2 + (“i “ 2 

ieE L \ / \ keE ] 

+ 2 (“i'Pf/S'f - + ttiPfySrf)] 

ye£ J 

= i 2 + 2 

ieE L i^E J 

Since is a transition matrix and is P^-superregular, m and tt are 
non-negative. Hence 1(9^) > 0. 

From properties (1), (2). and (3), we can prove that Schwarz’s 
inequality holds for g and g whenever they have finite support. 

Lemma 8-55: If g = Nf and g = Nf are two potentials of finite 
support, then 

(g, g? ^ ^g)^g)• 

Proof: By Lemma 8-54 we have 

I(xg - g) = {xg - g,xg - g) > 0 
for all real x. Hence by properties (1), (2), and (3), we find that 
x^(g, g) - 2x{g, g) + (g, ^) > 0 

for all real x. If {g, g) = 0, then, for —2x{g, g) -h {g, g) to be non- 
negative for all X, it must be true that {g, g) = 0, and the lemma is 
trivial. Otherwise, the discriminant of the quadratic equation in x 
must be non-positive, so that 

4(sr, gf - 4(gr, g){g, g) < 0 
or 

(S', g)^ ^ (g, g)(g, g) 

= 

Lemma 8-56: Let g = Nf and g = Nf be pure potentials of finite 
energy, let 

Pi C ^2 C P3 . • . 
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be an increasing sequence of finite sets with union the set of all states S, 
and let 



yen) = 



'/sn 



/(n) ^ 






gM ^ and = iV/<'‘>. 

Then (g, g) = 



Proof: We have {g, g) = \{ixg + /Igr), and by symmetry it is enough 
to show that converges to fig. By monotone convergence, we 

have lim = g, since 0 < < ■ ■ ■ . Let 



Then 

and 



A<") = 



if isE„ 

0 otherwise. 



^(nyn) ^ 

lim = lim = g. 



The functions are non-negative and increasing; also /x is non- 
negative. Thus by monotone convergence, 

lim = lim = p. lim = fig. 



Lemma 8-57 : If g and g are pure potentials of finite energy, then 

{g, gf ^ (S'. s)(s. g)- 

Consequently {g, ^) < oo. 

Proof: Form the approximations to g and g as in the statement of 
Lemma 8-56. By Lemmas 8-54 and 8-55, 

(g,(n)^ ^<n))2 < (gf<">, gf‘"))(g<">, 

Applying Lemma 8-56 to each factor, we obtain 

(s. gf ^ (g> g)(g> g)- 

If g and g are any potentials of finite energy, then 

\{g,g)\ ^ {N\f\,N\f\) < Vi(iv|/i)i(iv|/i) < 

Therefore {g, g) is always well defined. We can now prove (3) in 
general by breaking charges into positive and negative parts. 

Proposition 8-58: If g has finite energy, then I(^) > 0. 
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Proof: Write g = Nf = Nf'^ — where Nf'^ and Nf~ are 

pure potentials. Then 

I(!7) = (g,9) = (Nr - Nf-,Nr - Nf-) 

= W + ) - 2(Nf\Nf-) + l(Nf-) 

> I{Nf*) - 2Vl(Nf*)l{Nf-) + I(Nf~) by Lemma 8-57 

= {Vi(Nr~) - vwr)? 

> 0 . 

We can finally prove Schwarz’s inequality for all potentials of finite 
energy by proceeding just as in the proof of Lemma 8-55. 

We now begin the proof of the fundamental result about energy, the 
theorem that justifies the name “equilibrium potential.” Unfor- 
tunately the result fails for the most general transient chain, so that 
some extra hypothesis is needed. We shall prove the theorem — 
Theorem 8-61 — under the hypothesis P = P, a condition that is 
satisfied in the classical case of the three-dimensional symmetric random 
walk with a = 1 

Lemma 8-59: The energy of the equilibrium potential on E is the 
capacity C(E). 

Proof: I(A^) = = ryfl = = C{E). 

Lemma 8-60 : If P is an equilibrium set, iig = Nf is a potential with 
support in E, and if P = P, then (g, h^) = af. 

Proof: 

2(g, h^) = H- 'q^g 

= + 'q^Nf 

= H- /x^e^ by duality 

= 4- since N = ^ 

= 2fjih^ since = Ne^ 

= 2/x1 since / has support in E 

= 2a/. 

Theorem 8-61 : Suppose that P is a chain in which P = P. If P is 
an equilibrium set, then the equilibrium potential for P minimizes 
energy among all potentials of finite energy whose support is in P and 
whose total charge is C(E). 
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Proof: Let g — Nf be a potential with af = C(E) and with support 
in E. By Lemma 8-60, (g, h^) = C(E), and by Lemma 8-59, I(h^) = 
C(E). Furthermore, l(h^) / 0 by property (4). Therefore, by 
Schwarz’s inequality, we have 



i(?) ^ 



{g, 

I(A^) 



C{Ef 

C(E) 



C{E) = I(A®). 



We shall see in the next section that the theorem need not be true if 
P # 



6. The basic example 



In this section we shall work out what the results of the preceding 
five sections mean in terms of the basic example. 

First we compute P, the jS-dual of P. 



. PjPh 



= 1 ifj = i-l 

ifi = 0, 

L Po 



Thus the reverse process proceeds deterministically a step at a time to 
the left until it reaches 0. From 0 it may step into any state and does 
so with probability 

^0} = Pj ~ Pi + i- 

Since 2; Poi — 1 — jSoo < 1 and since 0 is reached from all states with 
probability one, the extended chain for P is absorbing. We saw in 
Section 5-10 that P has no non-zero regular measure; on the other hand, 
P is regular for P since 



— Po(Pj ~ Pj + l) + + 



From Section 5-10 we know that 



Hence 






fA 

^00 

A 



if i < j 



h 

Pi 



if i > j. 






Mji 

Pi 



fA 

P^ 

k 

Uco 



if i > j 



1 if i < j. 
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We note that N has columns tending to 0, but has columns bounded 
away from 0. The latter fact by itself implies that the extended chain 
for P is absorbing. 

Next we find the general form of potentials. In P if gr = iV/, then 

(1) g, = {Nf\ = 1 I ^ 2 

Poo ; = 0 ri j = 0 

In P if g — ^f, then 

( 2 ) 9. - r 2 - Z /,. 

j = 0 ; = i + l 

In either case, g is finite-valued if j8|/| < oo, in agreement with Lemma 
8-2. (For the reverse chain we have 



l/il’ 

j j 

and hence 2; \fj\ < QO.) 

Let /X = dual / and v = dual g. Then as required by duality 

= r S ft/< - I M, 

Hco j = Q y = 0 

Hco j = o j = o 

Thus /X is a left charge with potential measure v for P. 

Theorem 8-4 demands that f = (I — P)g when g is a potential with 
charge/. We have from (1) 

{Pg)i = + 9i+igo 



Pi A 



W) - 



Pi+i 



]8i 



+ 1 ;■ = 0 



2 + 



?i + l 



m 






Hence g^ - {Pg\ = and (/ - P)g = /. 
For both P and P, we have 







and thus the condition of Proposition 8-10 is satisfied with 

k — Hence all potentials in both P and P are bounded. We can 

see directly the boundedness from (1) and (2). In (1) and (2) we have 
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The estimate 

• ¥ 

of Proposition 8-10 is better; it takes into account the cancellation of 
the first and second terms in each expression. If / > 0 in (2), then the 
Qi increase monotonically to P\f\IPoo, so that the proposition gives 
the best possible bound. In (1) we have lim^ gi = 0, in agreement with 
the second half of Proposition 8-10 since the columns of N tend to zero. 
However, ^ does not have columns tending to zero, and lim^ in (2) 
is not necessarily 0. 

We determine the regular functions and signed measures for P and 
P as follows. If r is a P-regular function, then 



Thus 



ri = (Pr)i = + + i + qi^iTo- 



and 



^0 = Pi^i + p^Tq = p^r^, 



^0 = ^ 1 - 

Hence only the constant functions are P-regular. Dually only multiples 
of are regular signed measures for P. Since P has no regular signed 
measures, P has no regular functions. (Recall that P1 7 ^ 1.) There- 
fore, the non-negative superregular functions of P are pure potentials 
plus non-negative constants, whereas only pure potentials are non- 
negative superregular functions for P. 

Next, we determine the equilibrium sets for P and P. It is clear 
that P will be in any infinite set infinitely often a.e. Hence only finite 
sets are equilibrium sets. Let us verify this fact in terms of equilibrium 
potentials. 

Let P be a finite set with m as last element. Then 



and 

By (1), 



or 






if i = m 



[0 otherwise 
= ^00- 



ri 



if i < m 



hf = I 



1 — ^ if i > m. 
Pi 
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It is clear probabilistically that these are the correct values for 
when i > m the only way to avoid hitting E from i is to march straight 
out to the right. The probability of doing so is Poo I Pi- Thus we see that 
satisfies the conditions of an equilibrium potential: All non-empty 
finite sets have capacity Poo- 

If E is an infinite set, then ^ and = (/ — P)h^ = 0. Hence 
is not a potential, and E is not an equilibrium set. The fact that the 
supremum of the capacities of finite subsets of an infinite set E is j8oo 
does not contradict Proposition 8-39, since Sf does not have columns 
tending to zero. 

For the reverse chain P let E be any set (finite or infinite) with least 
element m. Then = 1 for i > m. For i < m. 



is _ tE _ 

rii ^ ^ 



iP rnIPoo) - 1 
PmIP. 



= 1 



The next to last equality follows from the fact that m = 0 is incom- 
patible with i < m. The process can escape from E only via m, and 
for m > 0 

ei = [(/ - = 

Pm 

If = 0, 

ef = 1-2^0; = /Soo=^- 
j Po 

Hence, in either case, 

= ^00- 

Thus all sets are equilibrium sets, and every set has capacity p^. We 
note that only the finite sets are equilibrium sets for both P and P, and 
their capacity P^ is the same in both, as predicted by Proposition 8-35. 

The dual of Proposition 8-23 is that every non-negative superregular 
measure is the increasing limit of pure potential measures of finite 
support. We shall produce the charges for P which give rise to the 
potential measures which increase to p. Let Ej^ be the set {0, . . . , m}. 
The functions h^rn = are the functions which Proposition 8-23 

gives as increasing to 1. Therefore, by duality the measures 
should increase to p, and the charges we seek in P are the . From 
our above calculations we have 

CP^ if i = m 

^ . . 

I^O otherwise. 






Pi 

Pi - p 



and 



if i < m 
if i > m. 
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Next suppose as in Lemma 8-24 that Nf < Nf with / > 0 and j8/ 
finite. Since by (1) 

w)o = ^ m 

Poo 

and 

mo = ^ m> 

Poo 

we must have j8/ < jS/. The proof for the reverse chain is not so simple, 
however. If with / > 0 and jS/ finite, then 



^ m - 2 // 

P cn 1 = t + 1 




00 







for all i. According to the proof of Lemma 8-24, we multiply through 
by and take the limit on m. We have 



2 2 u 

j = m + l j = m + l 

As m ^ 00 , we obtain j8/ < j8/. 

Turning to the potential principles, we shall first illustrate the 
Principle of Domination (Theorem 8-45). We do so in P. Let / > 0 
and / > 0 be given, let E be the support off, and suppose that 
for i G E. For convenience, suppose 0 e E. From (2) we have 



-is, 

j>i r® j>i 






for i gE. 



Let k be in E, and let i be the largest state of E for which i < k {i exists 
since 0 gE). Then 



9. - # - Z /, a £ 



j>k 



iSa 

■’<» j>k 



= ^ - 2 



ls,^f- 

j>i Poo 



2/. 

}>i 



the next to last equality holding since /y = 0 for i < j < k. Hence 

g > g- 

Next we examine the Principle of Balayage (Theorem 8-46) for P, 
Let / > 0 be given, and for convenience let E be an infinite set con- 
taining 0. We wish to choose / with support in E so that g^ = gE- 
For i G E, we must have 
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Let k be the next element of E greater than i. We must also have 

;->fc Poo ;>;c 

Subtracting, we find that = 2y = t + i/r This equation determines 
all of/ except for/g. Adding the relations iov with /: > 0, we obtain 

2;>0 /; = 2;>0 /;• 

Since for we need 



- 1 ^ - 2 



i8 



y>o 



y>o 



we must choose j8/ = j3/. Thus set 
/o = /o + 2 ~ 



j>0 



f^=z 2 fj for i, k E E and j ^ ^ when i < j < k. 

j = i + l 



To see that we have actually chosen /g > 0, we note that j8y decreases 
with / so that if i, k e E with no j e E for i < j < k, then 



Pi.lifi.l - fi^l) + - fi^2) + • • • + hih - A) 

= ^i + lfi + 1 + ^i + 2fi + 2 + * * * + ^kfk ~ Pkfk 

= ft + l/i + 1 + • • • + ^kfk - Pk(fi + 1 + • • • + A) 

= (A 4-1 ~ Pk)fi + 1 + * • • + (^k-1 ~ ^k)fk-l 
> 0. 



We shall illustrate the Principle of Condensers (Theorem 8-51) for 
P in the case = {0, 1, . . . , a} and F = {b, b 1, . . .} Avith 0 < a < 6. 
We have 

fl for i < a 



Then 



9i = \ ^ - J. for a < i < b 

for i > b. 



fi — 9i ~ Pi + l9i + l ~ 9i + l9o 



fiS. .. . 

^ II ^ = a 

Pa 

-qi + i if i > b 

^0 otherwise. 
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Hence 

= Pb ~ 2 + 1 

i^b 



We can verify from (1) that g is the potential of/, and we see that/^ 
has support in has support in F, and that j8/ > 0. 

We can show by example that the Principle of Condensers does not 
hold for all equilibrium sets E. In P let E be the set of even states 
and let F be the set of odd states. All sets are equilibrium sets for P, 
but E does not have a finite boundary. If 

r 1 for i G P 

1 .... 

[O for ^ G P, 

the theorem requires that gr be a potential. But if i e P, 

/ = 0 - (Pg), = -1. 

Hence 

(Sf-). = - 2 »./ = - 2 (r - ■)■ 

j odd j odd / 

The expression on the right side may be infinite if the /3’s are chosen 
properly. Let 

^ 1 1 

~ rr^ 2 

^00 = i- 

The jS’s determine the transition probabilities uniquely, and for this 
choice (i9/")o is infinite, a contradiction. 

Equation (1) gives us the following relation for energy in the 
P-process. 

m = / - I fi 2 Mi- 

roo t = o j = o 

It is not difficult to see that (2) yields the same value for the same /. 
But it is not easy to see that I( 9 t) > 0 if gr is not a pure potential. We 
shall now show that Theorem 8-61 fails if the assumption P = P is 
dropped. In P let P = {0, 1, . . . , m}. We have seen that C(E) = 
Thus any potential with total charge j8/ = jSoo equal to that of the 
equilibrium potential has energy 

- 2 /i 2 Mj- 

i =0 j =0 
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The equilibrium potential has energy jSoo, which is a maximum (not a 



minimum) among pure potentials. 


For example, let 






if i = 0 


fi = ' 




if i = 1 




0 


otherwise. 



Then j8/ = and 



— ^00 foiPofl) — poo — 7^ Poo^ ^ P<* 



7. An unbounded potential 



In Proposition 8-10 we saw that a sufficient condition for all potentials 
to be bounded is that a be chosen so that for all i. The 

purpose of this section is to show that unbounded potentials may exist 
when this hypothesis is not satisfied; the potential we exhibit will have a 
bounded charge whose support is at the same time an equilibrium set 
and a dual equilibrium set. 

The chain P will be a modification of sums of independent random 
variables on the line with Pi = \ and = §. Let 



and 



Pui-i = I 

PiA + i — i 



for i < 0 




Pui- 



2 

Ji 



for i > 0. 






1 

Wi 



If the process is watched only when it changes states, it becomes the 
Vi = T-i — i process, so that we may compute H from the latter 
chain. 

h„4' 

\(^y-* if i < j. 

Therefore, 



Pa — Pi.i + l'^ ■+■ Pi,i-1'\ + 



P,,\ 



— 1 — 



N 



a 



1 



2 

Pjj-i 



and 
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Hence 



3 


if j < 0, 


i > j 




if j < 0, 


i < j 


3j 


if j > 0, 


i > j 


CO 

1 


if j > 0, 


i < j. 



Let E = {1, 2, 3, . . .}. Since the process goes toward — oo with 
probability one, it can be in E only finitely often a.e. Moreover, 
ef = 0 unless i = 1, so that ae^ < oo for any choice of a. Thus E is an 
equilibrium set. 

We shall take a to be the zeroth row of N, Then 



aj = iVoy = 



If we calculate P, we find that 

= Pii 
A.i + l = f 
A.i-1 = i 



3 if j < 0 

mV 

for i > 0, 

1 for i < 0, 



and 

Pqi = ^O.-l = 

With probability one the P process reaches 0 from all states, and from 
there it can disappear. Hence the extended chain for P is absorbing, 
and P is in any set only finitely often a.e. As before, ae^ < oo, and 
E is therefore an equilibrium set for P, 

Thus E is both an equilibrium set and a dual equilibrium set. We 
shall choose a bounded charge with support in P. Let 



1 \i ieE 
0 otherwise. 



Then 



«/= 2 w = 

y>o 



3(^) 
(1 - 



= 6 . 



Thus / is a charge. Its potential is 

f 2 if i < 0 

y>o 

2 + 2 if i > 0. 

U = i i>i 



9i = 
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Summing these expressions, we find that 

for i < 0 
+ 3i + 4) for i > 0. 

Thus limi_^ 4 .oo gi = +oo, and g is unbounded. 

We note that P^g for large w is a weighted sum of g values along paths 
that the process is likely to take. Thus the fact that limf_ + oo = + QO 
does not contradict P^g — > 0 since the process moves toward +oo with 
probability zero. On the other hand, in the direction that the process 
does go, namely — oo, we do have limf^.oo gi = 0- 

8. Applications of potential-theoretic methods 

Many useful quantities for transient chains arise as means of non- 
negative random variables: = Mj[z]. To compute h we can often use 

a systems theorem argument (Theorem 4-11 with the random time 
identically one) to obtain an expression of the form h = Ph -f- /. 
Under appropriate circumstances we may write {I — P)h = /, and if h 
can be shown to be a potential, we conclude h = Nf. The purpose of 
this section is to give some sufficient conditions under which all these 
steps are valid and to apply the results. 

We first restrict our attention to the case of a bounded non-negative 
random variable z. Later in this section we extend our results to 
obtain Theorem 8-67, which is a powerful tool applicable even if the 
vector h is not necessarily ^nite-valued. 

To maximize the number of .potentials, we choose a row of N as a. 
Then all finite-valued functions of the form Nf are potentials, and a set 
E is an equilibrium set if and only if the process is in E only finitely 
often a.e. Let z be a bounded non-negative random variable, and let 
= z(co^). We shall assume that z^^^ < z. Then M^[z] is finite 
for all i, and Mj[z^^^] < Mj[z]. Define 

h = M,[z] 

and 

f, = Mlz - z'l)] > 0. 

Lemma 8-62: The column vector h is superregular and satisfies 
(/ — P)h = /. Furthermore, z^°°^ = lim^_^oo exists and is finite. 

Proof: By Theorem 4-11 with the random time taken to be 
identically one, we have 

M.[z‘»] = 2 ^A[z] = (Ph\. 

k 
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Therefore, 

[(/ - P)h\ = Mi[z] - = Mi[z - z(i>] 

= u 

Since / > 0, A is superregular; and since z^^^ < z, we have 
z > z^^^ > z^^^ > • • • > 0. 

Therefore z^°^^ exists and is finite. 

Lemma 8-63: The function h satisfies h = Nf if and only if z^°°^ = 0 
almost everywhere. 

Proof: By dominated convergence, 

M^[z^°°^] = lim M^[z^^^], 
and by Theorem 4-11 with the random time n, 

{M,[z^^^]} = P^{M,[z]} = P^h. 

Hence 

M^[z(°"^] = lim P^h. 

By Theorem 5-10, 

= (Nf), + 

Thus h = Nf if and only if = 0 for every i. Since z^°°^ > 0, 

Mf[z^°°^] = 0 for all i if and only if z^°°^ = 0 a.e. 

Lemma 8-64; If either of these conditions is satisfied, then h = Nf\ 

(1) There exists an equilibrium set E such that z(o>) = 0 for every 
path CO which does not go through E. 

(2) The enlarged chain is absorbing and z(co) = 0 for every path co 
which begins in the absorbing state. 

Proof: The second condition is just the first for P, the set of all 
transient states. For the first condition, on every path which does not 
pass through E, z(co) = 0, so that z^^^(co) = 0 for every n > 1 on such 
paths. On almost all paths which do pass through E, there is an n 
which is a function of the path and which denotes the last time the 
process passes through E on that path. Therefore, on almost all paths 
there is an n depending on the path such that z^^^(co) = 0. Hence 
z^®^ = 0 a.e., and the result follows by Lemma 8-63. 
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We shall now generalize our considerations. Let z(co) be a non- 
negative random variable with = M<[z] not necessarily finite. Sup- 
pose z > z^^\ and define = min (m, z(o>)). We agree to set 

z - z^^^ = 0 at all points where z^^^ is infinite. Then 

z(oi) = lim "*z(a>). 

m -* 00 

The crucial property of the functions ”*z is that 

We denote the common value of and by ^z^^^ and we 

define ^z^”^ analogously. 

Lemma 8-65: If z > z^^^ > 0, then z — z^^^ > ”*z — > 0. 

Proof: 

'z — z^^^ if z < m 
^z — ^z^^^ = < 0 if z^^^ > m 

m — z^^^ if z > m > z^^\ 

Define vectors and by 

= Mil^z] 

and 

% = Mil^z - ^z<i>]. 

Lemma 8-66: and lim^ = /. 

Proof: We note that = ^z. Applying Lemma 8-65 to 

we find that ^"^^z — + > ^z — Then ^ ^fv 

Since lim {^z — = z — z^^^ a.e., lim ^/ = / by the Monotone 

Convergence Theorem. 

Theorem 8-67 : Let z be a non-negative random variable. Define 
z^^^(co) = z(o>i) 

z(o>) — z<^>( oj) = 0 when = +oo 

A, = M,[z] 

/, = M,[z - z<»], 

and suppose that z > z^^\ If z satisfies either one of the following 
conditions, then h = Nf \ 

(1) There exists an equilibrium set E such that z(o>) = 0 for every 
path oj which does not go through E, 
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(2) The extended chain is absorbing and z(o)) = 0 for every path o> 
which begins in the absorbing state. 

Proof: For each m, is bounded. If one of the conditions applies 
to z, then it applies to because 0 < "^z < z. Hence by Lemma 
8-64, 

^h = N % 

By Lemma 8-66, increases to /, and by monotone convergence ^h 
increases to h. Therefore h = Nf hy the Monotone Convergence 
Theorem. 

We now apply Theorem 8-67 in four special cases. 

First let P be an absorbing chain with fundamental matrix N. We 
define = Mi[a(o>)^], where a(co) is the absorption time defined in 
Chapter 5. The column vector is indexed by the transient states 
of P. 



Proposition 8-68: If P is an absorbing chain, then 
a<’’> = "2 



Proof: Start the process in a transient state i, and let a be the 
absorption time. Since i is transient, 

a^(co) = (a + ir(coi) 

so that 

a’'(co) — SL^(oji) = (a -h l)^(wi) — 
or 

(a^ - = ((a + ly - aO(o>i). 



By Theorem 4-11 with the random time identically one, 



M,[a^ - = 2 + 1)' - a'] 

k 



= 2 ^A[l] + 2 2 (Ma"! 

/cabs. /c trans. Lm = 0 ' 'V J 



since 2 ^</c + ^Qik — Theorem 8-67 let z = a*". Then z 
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satisfies z > and condition (2) of the theorem holds for z. As we 
have just shown, 

and we also have h = Hence h = Nf by the theorem. 

Corollary 8-69: Let P be an absorbing chain. Then there exist real 
numbers c and d such that 

a^’'^ < < da^^\ 

In particular, is finite-valued if and only if is finite -valued. 



Proof: By Proposition 8-68, 



,(r) 



m = 1 W/ 



Qa^”'^ + N^ , 



Since a 



(m) 



< a 



(r-l) 



even if both sides are infinite. Hence 
a^r) < (2r _ 2)NQa^^-^^ -h A1 

= (2^ — 2)(iV' — -f- A1 since N — 1 = NQ 

< (2^ - -f m 

< (2^ — since 1 < 

For the other inequality. 



y(r) 



1 ) 



> NQa^^ 






Hence < 2a^^\ 



7(r) 



Since a 



(r-l) 



< a 



(r) 



As a second example, let P be a transient chain, and for any two 
transient states i and j define 

W,, = M,[n/]. 

The reader may verify with the aid of Theorem 4-11 that if z = n^^, 
then 

M,[z - z<i>] = {2N,, - /)(,. 

Now [j] is an equilibrium set, and n^^ = 0 for all paths not going through 
this set. Hence, for fixed j, the column vector h with = W^j 
satisfies condition (1) of Theorem 8-67. Therefore, 

W = N{2N,, - /). 
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We note that W is finite-valued; similarly one shows that M^[n/] < oo 
for r > 2. 

Next let E be an equilibrium set and let z be the time at which the 
process is in E for the last time (or 0 if ^ is never reached). Set 
= Mf[z]. Then 

fl if J57 is ever reached after time 0 
z - z<i> = \ 

lo otherwise. 



Hence M^[z — z^^^] = hf, the probability that, from i, E is reached 
after time 0. The random variable z satisfies condition (1) of the 
theorem. Hence 

= Nh^. 



Finally, let E be an equilibrium set and let j be any state. Let Zy 
be the number of times in j before E is left for the last time (or 0 if E 
is never reached). Define 

Then Zy satisfies condition (1) of the theorem, and we have 



{ 1 if Xr)(co) = j and E is ever reached 
0 otherwise. 

Hence 

Mj[z - z<i>] = Sijhf, 

and 

Nf, = iVr{8<,Af} = N,,hf. 



9. General denumerable stochastic processes 

We shall show that any denumerable stochastic process can be 
represented within a transient Markov chain in such a manner that 
potential theory applied to the chain yields corresponding results for 
the stochastic process. 

Throughout this section we shall deal with a probability space Q 
with measure fx, and a fixed sequence of partitions of Q. Each 
has a denumerable number of cells, and C + For con- 
venience, we assume that = {^?}. We recall that (fj^, is a 
stochastic process if/^^ is constant on each cell of (This condition 
is Definition 2-5 expressed in terms of partitions.) If IJ e define 
fn(U) to be this constant value. 

Definition 8-70: If is a sequence of partitions, the space-time 
Markov chain for is defined to be a Markov chain whose states are 




234 



Transient potenticU theory 



all ordered pairs (JJ,ny, where and /i(?7) > 0, and whose 

transition probabilities are 






U,n>,<V,m'> 



MV) 

< 

0 



if m = n -h 1 and V C U 
otherwise. 



The chain is started in the state <(i2, 0), which will be called state 0. 



Proposition 8-71 : The space-time chain is transient, and Hi = (U,n), 
then 



= ^0. = nv= 



Proof: State i can be entered only on the nth. step, as is clear from 
Definition 8-70. Hence the chain is in i at most once, and = 
Hot = Along the path from 0 to f there is a unique sequence of 

cells 



Then 



U C C C7,t_2 C . . . C C7i C with 



P^Oi — -Po<Ui.1>-P<Ui.1>.<U2.2> • • • -P<C 7 „_ i.n-l>.<l 7 .n> 



_ /^(Hi) ^(^2) rjj. 

^(Q) lx(US"lJ^{Un-,) ^ 



For example, let be a sequence space with some probability 
measure /lc, and let be the partition such that As 

usual, we may think of a cell of as a path <ii, ^ 2 ? • • • » ^n) length 
nin Q. A state of the space-time chain may also be thought of as such 
a path, and the chain moves from <ii, ig, . . . , to Oi» • • • >in + i) 
only if = ij^ for I < k < n. The probability of such a transition is 

+ i + i I xi = A • • • A x„ = i„]. 

The starting state 0 may be thought of as the empty sequence. 



Definition 8-72: If (/,j, is a stochastic process, then the function 
/ defined on the states of a space-time chain by 

/«C7,n» =A(C/) 
is said to correspond to the process {f^, ^n)- 

We write / when / corresponds to If we identify 

two stochastic processes and (gr^, when a.e. for 
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every n, then the correspondence between all stochastic processes on 
and all functions on the states of the space-time chain is one-one. 
We now restrict ourselves to the case where the /„ are real-valued 
functions. Under the following definition the correspondence preserves 
inequalities, linear combinations, and limits. 

Definition 8-73: Operations on stochastic processes are defined by 

(1) (/n. ^n) ^ (S'n. ^n) if/n ^ 9n a.e. for all n. 

(2) a(/„, = (a/„ + bg,, 

(3) liirifc (/„«), = (/„, ^„) if lim/„<'‘> = /„ a.e. for all n. 

Lemma 8-74: If (Z^, ^n) /> ^hen 

I ^„) ~ Py 

in the sense that if either quantity is well defined, then so is the other; 
and if they are both well-defined, then they correspond. 

Proof: We shall proceed by induction on k. lik = 0, the result is 
trivial. Suppose that both quantities exist for some k and that they 
correspond. Then 

(P""V)i = 2 PiAP^f)} 

3 

or 

Ve^n + 1 

By inductive hypothesis, + | ^n) Hence, by def- 

inition of the correspondence, 



i^^f)<v,n + iy — ^[fn + l + k 1 ^n + l](^) 


1 

~ H-(V) 


1 

Wc V 






W6^„ + 1 + 


k 


(P'‘"V)<0.n> = 2 

v<=u 


/^(F) 

fi(U) 




Ve^n + 


1 We^n + 1 + fc 


1 

(i(U) 


2 

Wczu 





We^n + 1 + fc 
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That is, + Y exists if and only if + | does, and if they 

exist, then they correspond. 

Proposition 8-75: If ^ h, then (A^, is a supermartingale 

(martingale) if and only if h is superregular (regular) and P^h is finite- 
valued for all k. 

Proof: If h is superregular, then Ph < h. Hence, by Lemma 8-74, 
M[^n + i I ^n] ^ f^r all u. If (P^h) is finite-valued, then 

(P''A)o = 2 l^(U)K(U) = MK] 

Ue^ic 

is finite. Since is constant on cells of Definition 3-5 is satisfied. 

Conversely, if (h^, is a supermartingale, then + i \ 
and hence Ph < h hy Lemma 8-74. Moreover, M[A;c + n] = {P^^^h)^ 
is finite. If i = <f7, ?^>, then P^^^ = ^(?7) > 0. From 

(p''+"A)o = 2 (P"/^),■, 

j 

we see that {P^h)^ must be finite. The proof for martingales simply 
replaces Ph < hhy Ph = h. 

Definition 8-76: (/,^, is a stochastic process charge with potential 
(S'n. ^n) if 2nM[|/„j] < 00 and if (sf„, (M[/„ + fc | ^„], 

If, in addition, > 0 then the potential is called a pure potential. 

We shall make use of potential theory results for the space-time chain 
P. As the distinguished measure a, we select = Nqj > 0. As 
usual, if af is finite, then / is a charge. 

Proposition 8-77: Charge functions correspond to charge stochastic 
processes, and their potentials also correspond. 

Proof: If /, then by Proposition 8-71 

a\f\ = 2 MC7)-M[|/„| |^„](C7) 

<U.n> 

= 22 i^(U)-\uu)\ 

n Ue^n 

= 2mfn\i 

n 

Since sums and limits are preserved, we have by Proposition 8-74 

g = Nf =2 I ^n). 

k k 
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Thus g is the potential of/ if and only if the two conditions of Definition 
8-76 are fulfilled by (/^, and 

We give two applications of the correspondence. The first is a 
decomposition of non-negative supermartingales. 



Proposition 8-78: If ^n) is a non-negative supermartingale, then 
there is a unique representation 

{K ^n) = ^n) + ^n) 

of as the sum of a martingale and a potential. In the repre- 

sentation the martingale is a non-negative martingale, the potential is a 
pure potential, and r^ satisfies = lim;^. + ^ | Moreover, 

(S^n, ^n) is the difference between a martingale and a process consisting 
of an increasing sequence of random variables. 



Proof: Existence and uniqueness of the representation follows 
immediately from Theorem 5-10 and Propositions 8-75 and 8-77. Then 
= limfcM[/^^ + ;^ | by Lemma 8-74. For the last part, let 
(/n. ^n) be the charge of \g^, Set 

= /o + • • • + /n-1 



Then increases monotonically to s, and 

MW < 2M[/n|^n] < CO 

n 

by monotone convergence. Since > 0, 

gr„ = 2 I = Mg I 

= M[s - s„ I = M[s I - 5„. 

Hence is the difference between the martingale {M[s | ^^1} the 
increasing sequence 



As the second application, we give a proof of the Upcrossing Lemma, 
Proposition 3-11, as it applies to non-negative supermartingales. The 
present estimate is better than the one in Chapter 3. 



Proposition 8-79: Let r and s be real numbers with 0 < r < 5. Let 
P(co) be the number of upcrossings on o> of [r, s] by the non-negative 
supermartingale (f;^^ ^k) to time n. Then 



M[p] < 



r 



s — r 
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Proof: Let / ^„);/is non-negative superregular. Let E and 

F be the sets of states in the space-time chain defined by 

E = {(JJ, rrC) \ m < n and fm{U) < t) 

F = «C7, m> \m < n and iJJJ) ^ ^}* 

Hence fi<r for i e E and fj > s for j e F, For any other state, 
i = (^U,rn) with m < n, r < fi < s. Now AJ = (5^1 )o is the prob- 
ability that the random variables fg, . . . , fn ever take on a value greater 
than or equal to s. Similarly, (B^B^^)Q is the probability of at least 
one upcrossing, and in general [(B^B^)^^]Q is the probability of at 
least k upcrossings by fg, fi, . . . , fn- Hence 

m] = 2 pr[p ^ = 2 

fc = 1 fc = 1 

Since the chain cannot be in F after time n, F is an equilibrium set, 
and = B^^ is a potential. For i e F 

= I <\fi. 

By the Principle of Domination (Theorem 8-45), 

everywhere. Hence < (lls)B^f. For ieE, 

iB^f)i < r. 

Since / is superregular and since r^ is superregular, 

B^f < r^ 

everywhere by conclusion (2) of Proposition 8-16. Thus 

B^B'^l < - 1 . 
s 

By induction, 

(B^B^)’^^ < 

and hence 



M[p] < 2 

k = l 



r 



s — r 
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10. Problems 

1. If P is a transient chain whose states communicate and if a > 0 is 
superregular, show that 

(a) P stops (disappears) a.e. if and only if a is a potential measure. 

(b) The mean stopping time is finite if and only if a is a charge. [Hint: 
Adjoin an absorbing state to P.] 

2. Let P be a recurrent chain, and let P be a finite set. Show that for any 
specified values of there is a unique bounded function h with the 
specified values which is regular on E. Show that h takes on its 
maximum on E. 

3. Illustrate the result of the previous problem for the symmetric random 
walk on the integers with the specified values Kq = 0 and = 1. 

4. Let P have only transient states, let tt be a specified probability vector, 
and let a — ttN. Let/ > 0 be a charge, and define the random variable 

s = 5 fM- 

n = 0 

Show that s is finite a.e. and that M^j[s] = af. What is the value of 
Mj[s] ? Give a game -interpretation. 

5. In the framework of Problem 4, introduce a second charge /, its potential 
g, and s. Prove that if /x = dual /, then 

(g,g) = iM;,[ss] + 

Find the corresponding expression for 1 ( 9 ^). 

6. If P is a chain with only transient states, let Var^y be the variance of ny 
for the process started at i. Show that 

Var,y = A,y(2Ayy - A,y - 1). 

7. In the framework of Problem 6, if P1 = 1 , prove that for each / there is 
an i such that Var^y > N^j. 

Problems 8 to 19 refer to the following Markov chain: The states are the 
non-negative integers. From state i either the process moves one step to the 
right with probability Pi > 0, or it remains at i. 

8. Find H and N. 

9. Give a simple characterization of 

(a) the regular functions, 

(b) the non-negative superregular -functions, 

(c) the pure potentials, where ay = Nq^ 

(d) the potentials, where ay = Aoy- 

10. What does Theorem 5-10 say about this chain? 

11. If gr is a potential with charge /, give a simple characterization in terms 
of g of the support of /. Of the support of / + . 

12. Use Problems 9 and 11 to verify that Theorem 8-45 holds for this chain. 
[Hint: Distinguish the cases where the support of is finite and where 
it is infinite.] 
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13. Use Problems 9 and 11 to construct a counterexample to Theorem 8-45 
if the assumption ^ > 0 is omitted. 

14. If form P and compute N. 

15. For ^ = {0, 1, 2}, find Ti^. [See the end of Section 8.] Show that 

= Nh^ has the desired interpretation. [Remember that the chain 
starts at time 0, not at time 1.] 

16. Show that C(E) = 1 for all equilibrium sets. Show that both the 
hypothesis and the conclusion of Proposition 8-39 are false for P. 

17. Show thatU(P) = 1 for all sets E. Show that both the hypothesis and 
the conclusion of Proposition 8-39 hold for P. 

18. Let E — {0, 1, . . . , n}. Find If ^ is a potential, what is the form 
of its balayage potential on P ? Show that Theorem 8-46 is satisfied. 

19. Show that if /x = dual /, then 

1(9^) = 2 (2 2 ^ 

Problems 20 to 30 refer to sums of independent random variables on the 

integers with \ and = f. Use the results of Problems 12 to 19 in 

Chapter 5. 

20. Show that there are two essentially different positive regular measures 
and that all regular measures are linear combinations of the two basic 
measures. 

21. Show that if / > 0 and a/ < +oo for either a, then g = Nf is finite- 
valued. Show also that lim P^g = 0. 

22. Let E = {0, 1, . . . , Compute Choose a non-negative super- 

regular function Ji (not a constant), and verify the various parts of 
Proposition 8-16. 

23. For P as above, compute and verify that Ne^ — h^. 

24. For P as above, compute C(E) = ae^ for each of the two basic measures. 
What happens as n increases ? 

25. Form P and compute N for each of the two basic measures. In each case, 
does N have columns tending to 0 ? 

26. Use the results of the last two problems to show that each assignment of 
capacities is consistent with Proposition 8-39, even though lim^ C'(P) 
is finite in one case and not in the other. 

27. Show that there are infinite equilibrium sets for this chain. 

28. Choose « = U. Prove that if lim,^_^ + oo = 0 and 2 l^i ~ ^i-i| < 
then h is a, potential. 

29. Choose a function h satisfying the conditions of Problem 28, compute its 
charge /, and check that Ji = Nf. 

30. Let P = {0, 1, 2}. Compute and the fundamental matrix of this 
finite chain. Verify that the latter is iV^. 
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1. Potentials 

Throughout this chapter P is a recurrent chain which is either null or 
noncyclic ergodic. For such a chain, lim^ P^ always exists; we let 
L = lim P^. 

In a recurrent chain the non-negative finite-valued superregular 
measures are uniquely determined up to multiplication by a constant, 
and the non-zero ones are positive and regular. We choose one such 
non-zero regular measure and call it a. 

If P is noncyclic ergodic, then = ay/2/c whereas if P is null, 
then L^j = 0. In either case, = 0. 

Duality for P is defined with respect to the regular measure a. The 
dual P of a null chain is null, and the dual of a noncyclic ergodic chain 
is noncyclic ergodic. In general, if two results are duals, we shall prove 
only one of the pair. As usual, the key to the proof by duality of the 
second result is that P is the most general chain of the type we consider 
in this chapter. 

As Definition 9-1 suggests, we define charges and potentials in the 
same way as in transient potential theory. 



Definition 9-1 : If /x is a signed measure with /x1 finite and if 
V = lim [/x(7 -f P -f • • • -f P^“^)] 

n 

exists and is finite-valued, then /x is called a left charge with potential 
measure v and total charge /x1 . If / is a function with af finite and if 
g = lim^ [(/ -f P -f- • • • + P^“^)/] exists and is finite-valued, then / 
is called a right charge with potential function g and total charge af. 
The support of a charge is the set on which the charge is not zero; the 
support of a potential is the support of its charge. 
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The definition of the support of a potential is not justified until we 
prove a uniqueness theorem for the charge of a potential, but such a 
result will follow directly from conclusion (2) of Theorem 9-15. 

We note that the dual of a left (right) charge for P is a right (left) 
charge for P and that their total charges are the same (since the dual of a 
number is the same number). 

Although we adopt the same definitions as with transient chains, the 
results are sometimes significantly different. For example, the only 
pure potential is the zero potential: If / > 0, then by monotone 

convergence lim^ [(/ + P H + P"""^)/]/ = 2i where = 

+00 for every i and^‘. Thus the limit is finite-valued only if / = 0. 

On the other hand, every row (or column) of / — P is a charge, and 
the potential of the ith row (column) of / — P is the ith row (column) 
of / — P. For if /X is the ith row of / — P, then |/x|1 < 2 < oo and 

Vj = lim [>(/ + P H + P'^”^)]/ = lim (/ - 

n n 

-{I - L),r, 



the assertion for columns is dual. 

Our first potential theory result will be that every charge has total 
charge zero. To prove this fact, we require the Doeblin Ratio Limit 
Theorem, Theorem 9-4. We recall that is the probability starting 
in i of reaching j before or at time n and that is the mean number 
of times the process started at i is in j up to and including time n. 
Hence H\]'^ = 2fc = o ^ 2lc = o from the latter 

relation we see that In terms of this notation the 

Doeblin Ratio Limit Theorem states that in any Markov chain with a 
positive superregular measure 



lim 

n 






exists and is finite for any states i, j, and j' which communicate. 
We shall give a simple proof of this important result. 



Lemma 9-2: Let P be any Markov chain with a positive superregular 
measure a. If i and j communicate, then the quantities 



]Sf(n) _ ^(n) 



and 



- Nlf 



are non-negative and bounded. In particular, \N^/}^ — 
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Proof: If i and j communicate, then < 1, so that = 
1/(1 — ^Hjj) < 00 . It is clear that 



> 0 

and that 



and hence the first expression is non-negative and bounded by Wyy < oo. 
By duality, is non-negative and bounded; if we 

multiply by aja^ and interchange j and i, we obtain the second result. 



Lemma 9-3: Let P be any Markov chain with a positive super- 
regular measure a. If i and j communicate, then for all n 



and 



A7(n) 
n iVyy 



iV<^) 



< 1 and 






< -!■, 



= 



and lim 



N\r 






Proof: By Lemma 9-2, 



0 < < c for all n. 



Hence 






0 < 1 < 



Therefore < 1 for all n. If j is recurrent, then -foo, 

and the ratio must tend to 1; we have = 1 since i must be 

recurrent if i and j communicate. Hence 









If j is transient, then 








H„N,; 


N<r 







The other results are duals, and the assertion about follows from 
Corollary 6-20. 



The following is the Ratio Limit Theorem. 



Theorem 9-4: Let P be any Markov chain with a positive superregular 
measure a, and let i, j, i', and f be any states which communicate. 
Then 



lim 






n 
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exists. If all four states are recurrent, the limit is a^jay. If the states 
are transient, the limit is 






Remark: Since the states in question communicate, they must be 
either all recurrent or all transient. 



Proof: Write 



Wr 





- 1 






-1 


pifi 


ml 













and apply Lemma 9-3 to each factor. 

Proposition 9-5: Every charge has total charge zero. 
Proof: If /x is a left charge, then 2 |i^/cl < ^ 



is finite. Therefore 



V,- = lim 2 

^ k 






0. 



Since by Lemma 9-3, < 1, dominated convergence gives 

/NiV\ ^ .. /NiV\ 



0 



= lim2/x,(||) = (||) = 



The result for functions is dual. 

The condition that a function / satisfy af = 0 is a strong necessary 
condition for it to be a charge, but it is by no means sufficient. In 
fact, it is not even sufficient in general if / also has finite support. We 
shall return to discuss this point at length in Section 2. 

We now establish as Theorem 9-7 an identity which will play a 
fundamental role when we develop an operator which transforms 
charges into potentials. 

Lemma 9-6: Let {a^} and {6,^} be two sequences of real numbers 
such that > 0, ^ = a < oo, |6^| < B, and \b^ — b^_i\ -> 0. 

Then 



lim 2 - bn-k) = 0- 

n-^oo j^ = Q 
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Proof: Let e > 0 be given. Choose N sufficiently large that 
^k>N pick large enough so that for all n > N' 

|6„ - 6„_i| < 

Then for n > N N\ we have 



fc = 0 



n, iv n 

2 ~~ ^n-k) — ^k\^n ^ ^n-k\ "t 2 ^fc(|^n| + |^n-fc|) 

k=0 k=N+l 

N k-1 n 

— 2 2 ~ “t 2 



fc = 0 ; = 0 

N k-1 



k = N + l 



^ .2 2 ^ n-j>N' 



k=0 j=0 

N 



- ^ . 2 . + 2 



< e. 



k = 0 



Theorem 9-7 : Let i,j, and k be arbitrary states in a recurrent Markov 
chain which is either null or noncyclic ergodic. Then 

lim [(iva> - iV<2>)a,/a, + N\r - = '^N„. 

n~* 00 

Proof: We may assume that neither i nor j equals k, since otherwise 
both sides are clearly zero. We begin by establishing four equations: 



00 



(1) 


v = 0 


(2) 


= 2 

v = 0 


(3) 


Niy = 2 

v = 0 


(4) 


N\r = 2 

v = 0 


Equations (1) and (3) follow from the fact that 2 ^\k = ^ik — 1* 
Equation (2) comes from Theorem 4-11 with the random time t = 
min (t;^, n), and equation (4) is a similar result, except that the sum has 
been broken into two parts representing what happens after and before 
state k is reached for the first time. 
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Multiply ( 1 ) by a^ja^, ( 2 ) by -ay/a^, ( 3 ) by - 1 , and ( 4 ) by 1 , and add. 
We obtain 

m - m 

= + 2 FWibn - 6„-v) + i 

v =0 v=n+l 

where {^n} is a bounded sequence by Lemma 

9 - 2 . The first term on the right side tends to and the third 
term tends to zero since is bounded and T is finite. It is thus 
sufficient to show that 

n 

lim y a^( 6 „ - = 0 , 

n v = o 

where Since > 0 , 2«n = s^nd { 6 „} is bounded, we 

need show only that ( 6 ;^ — 6 n-i) ^ ^ appiy Lemma 9-6. But 

hn - K-1 = m ? - TO - 

_ pin) ^ _ pin) 

~~ -^fck ^ ^kj 

CCk 

T T 

^k 

= 0 . 

If in the recurrent chain P we make a set of states E absorbing, then 
is an absorbing chain and results about transient chains may be 
applied to it. For example, the result Nij = yields = 

^Njj. We shall also make frequent use of the fact noted in Section 
6-2 that — (o^jloci) a- 

At this point we begin developing the machinery needed for the 
main result of this section. Theorem 9-15. We first need two pre- 
liminary identities, which we establish as Propositions 9-12 and 9-13. 

Lemma 9-8 : For any pair of states i and j, 

Proof 1 : If i = j, both sides are zero. If i 7 ^ j, then from Hjj == 1 
and (3) of Lemma 4-19, we find 

‘S,, + = H„ = 1, 

1 - = m,,. 



SO that 
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Hence 

%, = 1/(1 - ‘ 5 ,,) = 

Therefore, by Proposition 5-4 and Corollary 6-21, 

_ jfj 

XV Jj- iXj 

Proof 2: Set i = j in Theorem 9-7. Then 

lim [(iV'fct - )aj = ttfc 



Interchange i and k, and the left side stays the same. The right side 
becomes 



Lemma 9-9: For any states i, j, k with i / j, 






Proof: 



J <iV,, J 

OCy «y 

= by Lemma 9-8 

= 



Lemma 9-10: 









Proof: If i = 0, both sides are zero. Otherwise we have 
If we multiply through by ajaj^, we obtain 



Hence 



NiV ^ + -F ^ 






fcO 



^ki- 



Interchanging i and 0 and multiplying by ocjao gives 
^iVr<n) _ Mn) < 



«0 
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Therefore, 

^ ^ by Lemma 9-9. 

tto OCq 

As usual, we agree that and in particular ^N, is a square matrix 
indexed by the set of all states; the entries of on the rows or 
columns indexed by E are all zeros. 



Lemma 9-11 : If /x1 is finite, then fx is finite-valued. Dually, if af 
is finite, then ^Nf is finite-valued. 

Proof: If/x1 is finite, then 

2 l-^il = 2 < °N,,. 2 |/a.| < O). 

i i i 

Proposition 9-12: If fJL^ = 0, then 

lim 2/x.kv-^i^d 

n-*oo V L «0 J fc 

Dually, if «/ = 0, then 

lim 2 [iVjr - = 2 

k k 



Proof: Let 

sr = \m - m • 

L “o J 

Since /x1 = 0, we have 

2/xJ^ScV- 

k I «0 J fc 

By Theorem 9-7, lim^/S^;^^^ = Hence if we can prove that 

is bounded independently of n and k, then, since /x1 is finite, the result 
of the proposition follows by dominated convergence. (Note that 
i^/c is absolutely convergent by Lemma 9-11.) We have 



m 



The first term on the right is bounded according to Lemma 9-2, and the 
second term is bounded according to Lemma 9-10. 
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Proposition 9-13: If af = 0, then / is a charge if and only if 

lim [P”(°iV/)] 

n 

exists and is finite-valued. If/ is a charge, then its potential g satisfies 
g = °Nf- lim [P"(°i^/)]. 

n 



Proof: We have 



ro 



(POiVT).. = 



0A7 _ ^ 

“ a 






if ^* = 0 since ^Nj^q = 0 
if i = 0 and j ^ 0 
if i # 0 and j # 0 



ro 



((/ - P) ^N)„ 



s -«j7«o 



if / = 0 

if i = 0 and j ^ 0 



So 



if i ^ 0 and j / 0. 



By Lemma 9-11, ^Nf is finite-valued, and hence associativity holds in 
the triple product (I — P) ^Nf. We find 



' 2 = /i ^ ^ 0 

((/-P)°iV/h= 1 

2 

Ijto \ ^ 0 / «0 

The next to last equality uses the fact that af = 0. We see therefore 
that (/ — P) ^Nf = /, so that 



lim (/ + P -f • • • 4- P^~^)f 

= lim (/ + P + . . . -h P^-^)[(I - P) ^Nf] 

n-+ 00 

= lim (7 - P^) oiV/. 



Associativity where required in the last equation follows from the 
distributive property, which holds because ^Nf is finite-valued. 



In the discussion that follows, E and F denote non-empty subsets of 
states. 



Lemma 9-14: provided F C E. 

Proof: By Theorem 4-11 with the time to reach E as the random 
time, we find that (P^ is the mean of the total 
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number of times that the process is in state j before F, counting from 
the time when E is first entered. This mean is the difference of the 
number of times in j before F and the number of times in j before E, 
since F cannot be entered unless E is entered. 

Theorem 9-15: If / is a function with a/ = 0 and if 
lim [(/ + P + • • • + P^-^)/]o 

exists and is finite for some state 0, then / is a charge. If g is its 
potential, then 

(1) g = °Nf+goV 

(2) / = (/ - P)g. 

(3) PV^O. 

(4) B^g = g if the support is contained in E. 

(5) — P^)gE if the support is contained in E. 

Proof: By Proposition 9-12, if lim^ ^o'kfk exists, then 
lim 2 = 2 

n k n 

or 

(lim2-^ifcVfc') - = W)i- 

\ n jc / \ n k I 

Hence lim^^ 2 a: ^Wfk exists, and /is a charge. 

(1) Therefore g^ — g^ = and (1) follows. 

(2) From the proof of Proposition 9-13 we have (/ — P) ^Nf = /, 
since a/ = 0. Thus if we multiply (1) by / — P, we get (2). 

(3) By (2) we have g = Pg -f /, and, since P^f is finite -valued, we 
see by induction that P^g is finite-valued and that 

pk-lg ^ pkg ^ pk-lf 

Adding these relations for k = 1, . . . , we obtain 
g = P^g -f (/ 4- P + • • • + P^~^)f. 

Hence P^g 0. 

(4) Let 0 E E. By Lemma 9-14 with F — {0}, 

pE ojsf ^ oj^ _ EJSf^ 

pEg = pE pEg^^ 

= OA/ - ^Nf -f go^ since P^1 = 1 
= g - ^Nf by (1). 



and by (1) 
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Since /has support in E,^Nf = 0 because = 0 when j e E. 

(5) /= (I - F)g = (I - P)(B^g) = [(/ - P)fi% 

I - 0\(gA 

11 I by Proposition 6-17 

We note from conclusion (2) that a potential uniquely determines its 
charge. 

Corollary 9-16: If (I — P)g = f, then ^ is a potential if and only if af 
is finite and P^g 0. 

Proof: If ^ is a potential, then P^g ^ 0 by conclusion (3) of Theorem 
9-15 and / is the charge by conclusion (2). Hence af is finite by 
definition. Conversely, if (/ — P)g = /, we have by induction 

g = png ^ (I pn-1)/. 

If P^g 0, then g = lim [(/ -f P -h • • • + P^"^)/]; if «/is finite, then 
is a potential by definition. 

We already know that the columns of I — L are potentials whose 
charges are the corresponding columns of / — P. Corollary 9-16 
allows us to enlarge this result as follows. 

Corollary 9-17 : If P is a null chain and if ^ is a function for which ag 
is finite, then gr is a potential. 

Proof: We shall apply Corollary 9-16. By writing g = g^ ~ 9~ ^ 
we may assume that ^ > 0. Then 

(P^g)i = 2 = - 2 

j j 

Since ag is finite and since P^^^ is bounded by one, we have by dominated 
convergence 

lim (P"gr)i = ^ 2 

n n 

Therefore P^g -> 0. Set f = (I — P)g. Then 

“1/1 “(S' + Pg) = «g + cc(Pg) = ag + (aP)g = ag + ag < CO. 

A corresponding result for noncyclic ergodic chains will be proved in 
Section 3. 
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2. Normal chains 

The condition at the end of Section 1 that gr is a potential in a null 
chain if ag is finite does not provide a sufficiently large class of potentials 
for a satisfactory potential theory. In this section we shall impose the 
condition on P that any function / of finite support with a/ = 0 be a 
charge and that the corresponding result for signed measures hold; such 
a chain will be called normal. The justification for considering normal 
chains will consist in our showing that the class of normal chains is 
quite extensive; we shall see in Section 3, for example, that all noncyclic 
ergodic chains are normal. 

Our procedure in introducing normal chains will be not to define 
them as above but to give a definition which is computationally simpler 
to check. The point of departure is the identity 

g = ^Nf — lim [P^(°iV/)] 

n 

of Proposition 9-13. 

Proposition 9-18: If for each j there is some i such that lim (P^ 
exists, then lim (P^ ^N) exists and has constant columns which are 
finite- valued. 

Proof: Let J be a fixed state and let 

{ ay/«o for i: = 0 
- 1 ^OT k = j 
0 otherwise. 

Since af = 0, we have (I — P) ^Nf = /. Hence 

lim [(/ + P -1 h P^-^)f]i 

= lim [(/ + P -h • • • + P^-i)(/ - P) 

= lim [(/ - P^) ^Nfl 
= + lim (P^ 

This limit exists for some i by hypothesis, and hence, by Theorem 9-15, 
/ is a charge and lim (P^ exists for all k. Hence lim (P^ ^N) 

exists. By Fatou’s Theorem, 

P lim {P^ ^N) < lim P P^ = lim (P”^ °iV), 

so that each column of the limit is non-negative superregular. Hence 
the columns of the limit are constants by Proposition 6-3. 
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Definition 9-19: If the indicated limits exist, then 
V, = lim 2 

n k 

and 

‘A, = 

^ k 

Notation independent of m in Definition 9-19 is just^’fied by Prop- 
osition 9-18. We note that Vy exists if and only if exists, that Vy 
is finite, and that 0 < ^Ay < 1. Furthermore, ^Ay is the probability 
that J is entered in the long run before i, and Vy is the mean number of 
times in j, in the long run, before reaching i. 

Definition 9-20 : A chain is normal if for some fixed state 0 and for all 
j, °Ay and °Ay both exist. 

We note the important fact that the dual of a normal chain is normal. 

Proposition 9-21 : If P is a normal chain and if, for a given function/ 
with af = 0, ^^kkfk is finite, then its potential g exists, is bounded, 
and satisfies g = [^N — 1 ^v]f. 

Proof: Since P is normal, P^ -> 1 Furthermore, 

2 < 2 °N„ = 

k k 

and ^Nf'^ and ^Nf~ are both finite -valued. Thus we have dominated 
convergence in Proposition 9-13, / is a charge, and g = [°A — 1 
Since by Theorem 9-15 \gi - ^ 9 is 

bounded. 



Corollary 9-22: If P is normal and if / is a function of finite support 
with o/ = 0, then /is a charge. 

Proof: We have 2 a: ^^kk\fk\ < Apply Proposition 9-21. 

The converse to Corollary 9-22 is the following. 

Lemma 9-23 : If the function / defined by 

( oCj jaQ if i = 0 



/* = 



-1 



if i = j 



[O otherwise 
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is a charge with potential gr, then exists and ^Tq = ^Vj, If all functions 
/ of support {0, j) with «/ = 0 are charges and if all signed measures /x 
of support {0, j] with /x1 = 0 are charges, then P is normal. 

Proof: For the first assertion we have, by Proposition 9-13, 

go = m)o - lim (P" °iV/)o 
= — lim ®iV/)o since 

— lim (P^ °^)o; since 
= ^v- 

Vy 

Hence °Ay exists. For the second assertion we see from the hypothesis 
about functions that ®Ay exists for all j. Dually we obtain from the 
hypothesis about signed measures that exists for all j. Hence P is 
normal. 

Definition 9-24: Matrices C and G are defined by 
Cij = lim ) 

n 

Gi,- = lim ^ 

n \ / 

whenever the indicated limits exist. 

According to Lemma 9-2, the quantities defining and are 
bounded and non-negative. Thus all entries of C and G which exist 
are finite and non-negative. We have further that 0^ = G^ = 0 for 
every i and that 

Hence G = dual C and C = dual G. 



= 0 for all k 
= 0 for all k 



Lemma 9-25: Gqj exists if and only if exists. If they both exist, 
then Gqj- = Dually Cjq = (ao/ay) °i^y. 

Proof: We need only note that for the potential defined in Lemma 
9-23 



9o 



lim 

n 



A7(n) ^ _ AT(n) 
(Xn 






Or 



Hence Gqj exists if and only if gg = ^vj exists, and then they are equal. 
The other result follows by duality. 
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Theorem 9-26: If for some fixed state 0 and all other states 

(1) °Ay and % both exist, 
or (2) G^j and Cjq both exist, 

or (3) all functions and signed measures with support {0, and total 
charge 0 are potential charges, 

then P is normal. Conversely, if P is normal, then (7, G, ^Ay, all 
exist (for both P and P), G^y = Vy = *Ay Wyy and 

n — ^ i A7 

ai 

and all functions and signed measures of finite support and total 
charge 0 are potential charges. 

Proof: (1) is the definition of normality. If (2) holds, then °*^y and 
°vy exist by Lemma 9-25, and (1) holds. The sufficiency of (3) was 
shown in Lemma 9-23. 

Conversely, suppose that P is normal. Then Corollary 9-22 assures 
that all / with finite support and af = 0 are potential charges. Consider 
the charge 

{ tty/a^ if k = i 

- 1 k =j 
0 otherwise. 

By Lemmas 9-23 and 9-25, its potential has as ith component = 
Vy, and Vy = ^Ay by definition. The remaining assertions follow 
by duality. 

Corollary 9-27 : If P is normal, then 

= -[(?,, - Gio(a,/«o)]. 

Proof: The potential of Lemma 9-23 has 0th component gQ = ^vj. 
By Theorem 9-15, 

9i = 9o + m)i = \ 

By definition, 

g, = lim - iyr(n)l = G„ - G,o 

L ^0 J ^0 

Corollary 9-28: If P is a normal chain and if/ is a function with af = 0 
and ^^kkfk finite, then / is a charge and its potential g satisfies 

g = -Gf. 
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Proof: By Proposition 9-21, 
g = {^N %]f 

= ^ ~ / by Corollary 9-27 

= —Gf since af = 0. 

The dual of Corollary 9-28 is that in a normal chain if /x is a signed 
measure with y[x1 = 0 and 2^ H'k finite, then /x is a charge and its 
potential v satisfies v = — /x(7. 

We can use Theorem 9-26 to conclude that all symmetric sums of 
independent random variables processes are normal; in particular, the 
one-dimensional and two-dimensional symmetric random walks are 
normal. 

Corollary 9-29: If P is a null or noncyclic ergodic sums of independent 
random variables process with P = P^, then P is normal, 

C,, = (?,, = and % = h 

Proof: If we put j in for k and i in for i and j in Theorem 9-7, we 
obtain 

lim ^ + (N\f - iV^<f)l = 

n-*co L J 

Now = ay and ^ in sums of independent random variables, 

and in a symmetric process. Hence 

lim )] = m,, 

or Cij = \ Alternatively 




or = I = I Wyy. Hence 'Ay = i 

The strongest known result for concluding that a function / is in the 
domain of the potential operator G is Corollary 9-28, but that result 
involves a condition that is hard to check. The definition and theorem 
to follow give a more useful condition. 

Definition 9-30 : A function / in a normal chain is said to be a weak 
charge if a/ = 0 and if Gf and Cf are both finite-valued. A signed 
measure /x is called a weak charge if =0 and if /x(7 and fxG are both 
finite- valued. 
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The dual of a weak charge for P is a weak charge for P. 

Theorem 9-31 ; If P is a normal chain and if / is a weak charge, then/ 
is a charge and its potential g is bounded and satisfies g — —Gf. 

Proof: Set i and j both equal to k in Corollary 9-27. Then 

°^kk - % = - G*fco(“fc/«o)] = ^fco(«fc/«o)- 

Since % = °^kk = ^fco(“te/“o) + (^ok- Thus 

2 = 2 + ^ok)fk = (Gf)o + {Gf)o, 

k k \ ^0 / 

and 2/c ^^kkfk niust be finite. The result follows by Proposition 9-21 
and Corollary 9-28. 

Dually if P is normal and /x is a weak charge, then v exists, is 
bounded by a multiple of a, and satisfies v = — fiC. 

For the symmetric random walks in one and two dimensions, we 
have C = G = C = G hy Corollary 9-29. Therefore, Theorem 9-31 
states for these cases that if a f = 0 and if Gfis finite-valued, then /is a 
charge and its potential is bounded and satisfies g = ~ Gf; this is the 
analog of the Brownian motion result. 

We shall now introduce the recurrent analog of the equilibrium set 
for transient chains — namely, the small ergodic set. Potentials with 
supports in small ergodic sets will be found to have special properties. 

Lemma 9-32 : If A is a non-negative bounded column vector such that 
lim^^ P^h exists, then the limit is a constant vector. 

Proof: By Fatou’s Theorem 

P lim {P^h) < lim P^^^h. 

Thus the limit is finite-valued, non-negative, and superregular. Hence 
it is constant. 

In particular, if lim P^B^ exists, then it has constant columns. 

Definition 9-33: If the indicated limit exists, then 

lim P^B^ = 1A^. 

n -* 00 

The entry Af is the probability in the long run of entering P at 
In the special case where P is a two-point set, we have °Aj = 
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Proposition 9-34: If exists, then > 0, A| = 0, and A^1 < 1. 



Proof: The first two assertions follow from the facts that > 0 

for every n and that = 0 if is not in E. Also, 

(A^1)1 = (1A^)1 = (lim < lim (P^P^1) = 1, 

so that A^1 < 1. 

Definition 9-35: If A^ exists and is such that A^1 = 1, then E is said 
to be a small set. 

To justify the name small set and to prove existence of small sets, 
we shall show that subsets of small sets are small and that in a normal 
chain all finite sets are small. 

Proposition 9-36: If P is a subset of a small set P, then E is small and 

A^ = A^P^. 



Proof: By Proposition 5-8, we have B^ = B^B^. Since each row 
of P^B^ tends to the probability vector A^, it follows from Proposition 
1-57 that 

pn^E ^ pnp^B^ 

Thus A^ exists and A^ = X^B^. In addition, A^1 = A^P^1 = A^1 = 1. 



Lemma 9-37 : If P is a normal chain and if ^^kk^kj is finite for 
every j e E, then A^ exists and the columns of — 1 A^ are potentials 
with support in E. 



Proof: 



/ - P^ 



0^ 



= 0 o ' 



and the columns on the right are charges for a bounded potential by 
Proposition 9-21. Thus the limits 

// - P^ 0\ 

lim (/ -h P + • • • + P^-i) = lim (/ - P^)P^ 

^ \ 0 0 / ^ 

exist, and the resulting potentials are the columns of P^ — 1 A^. 



Proposition 9-38: In a normal chain all finite sets are small. 
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Proof: Let jE/ be a finite set. By Lemma 9-37, exists. Moreover 

A®1 = y Af = 2 lim 

jeE jeE 

= lim 2 

jeE 

= lim 

= 1 . 

Corollary 9-39: In a normal chain -h ^A^ = 1. 



Proof: Since *Ay = Ah*^^ and ^Aj = we may apply Proposition 

9-38. 

Definition 9-40 : A set E of states is an ergodic set if is an ergodic 
chain. 

Proposition 9-41: E is ergodic if and only if < oo. All finite 
sets are ergodic, a subset of an ergodic set is ergodic, and the union of 
two ergodic sets is ergodic. 

Proof: a^P^ = Hence P^ is ergodic if and only if a^1 < oo. 
Thus the ergodic sets are the sets of finite a-measure; the remaining 
statements follow from this observation. 

Small sets and ergodic sets might seem to be related notions, but we 
shall see later that they are actually independent. 

Proposition 9-42: If is a small set and if gr is a bounded potential 
with support in E, then g is regular at points of E and X^g = 0. 
Conversely, if E is small and ergodic and if g is a bounded function 
which is regular at points of E, and which satisfies X^g = 0, then gr is a 
bounded potential with support in E, 

Proof: Let gr be a bounded potential with support in E. Since 
{I — P)g is the charge, g is regular in E, By Theorem 9-15, B^g = g 
and hence P^B^g = P^g. Since P^g -> 0, P^B^g 0. But by 
Proposition 1-57, since E is small, P^B^g — > 1 {X^g). Hence X^g = 0. 

For the converse g[Xj^) is a bounded martingale at points of E, since 
g is bounded and regular in E. By Corollary 3-16 with the stopping 
time the time to reach E, we have g = B^g. Now, since E is small, 

P^g = P^B^g ^{X^g) = 0 by Proposition 1-57. 
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In addition, 

«[(/ - P)g] = «[(/ - P)B^g] 

^(/ - 

= ^e{^e ~ P^9e)- 

Since E is ergodic and g is bounded, a^gE is finite and hence 

^e(Qe ~ P^Qe) — ^eQe ~ (^eP^)9e = 

Therefore ^ is a potential by Corollary 9-16, and it is bounded by 

/(I - P^)gE\ 

hypothesis. Its charge ( 1 has support in E. 



For potentials of total charge zero in recurrent chains, the Principle 
of Balayage takes the following form. 

Proposition 9-43: Let be a small ergodic set. If a: is a function 
bounded on E, then there is a unique potential g with support in E 
which differs from x on by a constant function. The potential is 
g = B^x — (A^x)1. 



Proof: Let g = B^x — (A^:r)1. 

{I - P)g = 



Then g is bounded, and 
I {I - P^)xA 
\ 0 /’ 



therefore g is regular on E. Since X^g = 0, gr is a potential with support 
in E, by Proposition 9-42; and g differs from x by the constant X^x 
on E. 

For uniqueness, let g' be another such potential. Then gE ^ 9 e ~ 
and^ — g' = B^g — B^g', Then^ — = Z:1. Since — gr')-^0, 

we must have k = ^. Hence g = g' . 



The result to follow is the total-charge-zero version of the Principle 
of Condensers. 



Proposition 9-44: Suppose that E and F are disjoint sets such that 
P U P is small and ergodic. Then there exists a function h such that: 

(1) = \ lii e E and hj = 0 if j e F. 

(2) f = (1 — P)h has its positive values in E and its negative values 
in F. 

(3) h is the sum of a potential and a constant function. 
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Proof: For existence, let a: be a function which is 1 on and 0 on F, 
and set h = By Proposition 9-43, (1) and (3) are satisfied and 

/ has support in E U F, For (2) we note that, \i ie E, then is the 
probability of return to F before E and, if i e F, then — /^ is the 
probability of return to E before F. 

For uniqueness, let a; be a function satisfying (1), (2), and (3). 
Then x = g c\, where gr is a potential, by (3). By (2), 

[I - P)g = (/ - P)x 

vanishes outside of E U F so that g has support in E U F, Hence 
g = Therefore, 

B^^^x = B^^^g + ) = ^ -f cl = X, 

and X is uniquely determined by its values on E KJ F, which are fixed 
by (1). 

We conclude this section with some results about normal chains 
which will be needed later. 

Proposition 9-45: In a normal chain 

- G„ = 

^k 

G^,^ + G,, = 

^k 

Proof: The first two expressions follow from Theorem 9-7 and 
Definition 9-24. For the other two, set i = j. 

By Lemma 9-32, if lim (P^ ^N) exists, then it is finite-valued and 
has constant columns. 

Definition 9-46: = lim^ 2 ;^ (^^hk ^^kjy provided the limit exists. 

Proposition 9-47 : exists if and only if exists. If they both 

exist, then Hence in a normal chain exists for all 

finite sets E. 
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Proof: = lim 2k (P”)ffc and 

Af-»> = lim 2 

k 

fc 

= lim 2 

k 

Remark: Actually, it can be shown that if E is small, then E U {j} 
is small. Hence exists for all small sets. (See the problems.) 

3. Ergodic chains 

For this section let P be a noncyclic ergodic chain, and choose a so 
that a1 = 1. Then L = lim = A — ^a. The dual of P, namely 
P, is also noncyclic ergodic, and the mean first passage time matrix for P 
has all finite entries (see Proposition 6-42). 

We begin by proving that all noncyclic ergodic chains are normal and 
by giving an existence theorem for potentials. 

Lemma 9-48 : For every i and j, 

0 < — AJy < 00 and lim 

Proof: Summing over the powers of P in Lemma 6-34, we obtain 

N\y = 2 

fe = 0 

where we use the convention = 0 if m < 0. Since 2^=o ~ 

= 1, we have 

- iv<f = 2 

k = 0 

As n > 00 , we obtain 

lim - N\y] = lim 2 

n n fc = o 

= 2 F\y\im[Nfy - 

k = 0 ^ 

by dominated convergence, since N^J)^ — < k and 2 ^ 0 ^^ = 

< 00 . Thus 

lim [Nfj^ - = 2 ^ 2 ^57^ 

fc = 0 ^ m = n- /c + l 

00 

= «y 2 

k = 0 

= 
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Theorem 9-49: Every noncyclic ergodic chain is normal, and 
Cij = In matrix form C — MD~^. 

Proof: By Lemma 9-48, C exists and satisfies C^j- = Since 

P is noncyclic ergodic, C exists and hence G exists. Thus P is normal 
by Theorem 9-26. 

Dually, Gij = ov G = 

For noncyclic ergodic chains we can prove a stronger result than 
Theorem 9-31 about the existence of potentials. 

Theorem 9-50: If / is a function with «/ = 0 and Gf finite-valued, 
then / is a charge and its potential g is such that g = —Gf and ag is 
finite. If, in addition, Cf is finite-valued, then g is bounded. Dually, 
if /X is a signed measure with /x1 = 0 and /xC finite-valued, then /x is a 
charge and its potential v is such that v = —fxC and is finite; if, in 
addition, /xG is finite- valued, then v is bounded by a multiple of a. 

Proof: We shall prove the dual statements. 

k k 

since Now by Lemma 

9-48, and l^k^ki — is finite by hypothesis. Hence, by 

dominated convergence, /x is a charge and 

— ~^l^kPki' 

k 

By the dual of Theorem 9-15, 

V = a°N + ^a. 

tto 

To show that is finite, it suffices to show that |/x| ^N^ < oo. But 

ImI = 2 \H-il 
u 

i 

= ~y l^ilCjo < 00 by hypothesis. 

«0 

If fiG is finite-valued, then v is bounded by a multiple of a, according to 
Theorem 9-31. 




264 



Recurrent potential theory 



Corollary 9-51 : 

(/ - P)(-C) =1-A 

and 

(~G){I - P) = I - A. 

Proof: A row oil — P is a charge whose potential is the correspond- 
ing row of I — A, Since 

0 < [PC)ij = 2 Pik^kj 
k 

k 

= — 1) by Proposition 6-41 

< 00 , 

(/ — P)(— (7) = PC — C is finite-valued. Hence (/ — P){ — C) = 
I — A hy Theorem 9-50. The second result is dual. 

From Theorem 9-50 we have a sufficient condition on charges for 
their potentials to exist. We turn now to conditions on functions to 
ensure that they are potentials, the same problem that we touched on 
for null chains at the end of Section 1. We shall prove as Theorem 
9-53 the result for noncyclic ergodic chains that corresponds to 
Corollary 9-17 for null chains. 

Lemma 9-52: If A is a function for which ah is finite, then P^h — > Ah, 



Proof: 



lp\i% 




{a,h,)Pr 




«i 



= ah. 



by dominated convergence 



Theorem 9-53: If g is a function for which ag is finite, then ^ is a 
potential if and only if ag = 0. If g is such a potential and if / is its 
charge, then g = (I — A) ^Nf. 

Proof: Set f = {I - P)g. Then af = a[(7 - P)g] = ag - aPg = 0. 
By Lemma 9-52, P^g Ag = 1(«g). Therefore, by Corollary 9-16, 
g is a potential if and only if ag = 0. If g is such a potential, then by 
Theorem 9-15 



g = °Nf + g,^. 
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If we multiply through by a, we obtain 
0 = a ^Nf + Qq. 

Hence 

g = ^Nf- A oNf = (I ^ A) ^Nf. 

Thus integrable potentials, those for which ag is finite, are precisely 
those integrable functions whose integral is zero. In particular, every 
bounded potential has integral zero. 

Examples show that associativity of A(^Nf) in Theorem 9-53 may 
fail. However, if it does hold, then Gf is finite-valued and g = —Gf. 
In fact, since the columns of are bounded, a is finite- valued. 
Hence, by Lemma 9-52, 



lim = A ^N. 

On the other hand, by definition 

lim ^ 



Hence by associativity 

^ = ojv/ _ 1 = [ON %]f. 

Therefore, by Corollary 9-27, 

9i = ^ {^f)i + ^io ” 

OCq 

= — (6r/)i since af = 0. 

For a further discussion of this point, see the Additional Notes. 

The existence of small sets for noncyclic ergodic chains is settled by 
the following proposition. Obviously all sets in such chains are ergodic 
sets. 

Proposition 9-55: In a noncyclic ergodic chain all sets are small, and 
= aB^ and = a ^N. In particular, for the set of all states, 
= a. 

Proof: By Proposition 1-57, P^B^ AB^ . Hence A^ = aB^ and 
A^1 = = a1 =1, so that every set E is small. For E = S, 

B^ = I and thus A^ = cc. The assertion about is proved similarly. 
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Corollary 9-56: If ah is finite and if (/ — P)h = /, then B^h is finite- 
valued and B^h = h — ^Nf, If / has support in E, then B^h = h. 
If in addition h is bounded, then X^h = ah. 



Proof: Let g = h — {ah)^ . Then gr is a potential by Theorem 9-53, 
and (/ — P)g = (I — P)h = f. The proof of conclusion (4) of 
Theorem 9-15 shows that B^g is finite-valued and B^g = g — ^Nf. 
Therefore B^h is finite-valued and B^h = h — ^Nf. If / has support 
in E, then ^Nf = 0 since = 0 for j in E. Hence B^h = h and 
a(B^h) = ah. If h is bounded, associativity holds, and we conclude 
from Proposition 9-55 that X^h = ah. 



For total-charge-zero potentials in noncyclic ergodic chains, the 
following is the form that the Principle of Balayage takes. 



Proposition 9-57 : If ah is finite and X^h is finite, then there is a unique 
potential g with support in E which differs from A by a constant on E. 



Proof: For existence B^h is finite- valued by Corollary 9-56, and we 
let 



g = B^h - (A^A)1 . 

Then g differs from h by the constant X^h on E. Since agr = 0, ^ is a 
potential by Theorem 9-53. Moreover, 

f = (I - P)g = (I - P)B^h = ~ 

SO that the support of f is in E. For uniqueness, let g' be another such 
potential. Then g — g' constant on E. Since B^{g — g') = g — g\ 
g — g' constant everywhere. But if P'^{g — g') — > 0, then g — g' 
must be 0. Hence g is unique. 

The following summary of the results for ergodic chains may be 
helpful. We consider the set of all states as a denumerable measure 
space of finite total measure a1 ; the measure assigned to state i is a^. 
A function h is integrable if ah is finite; we restrict our attention to 
integrable functions. 

We know that an integrable function is uniquely represe '^table as the 
sum of a constant and a potential. The constant is the integral of the 
function. Hence an integrable function is a potential if and only if its 
integral is zero. 
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We conclude this section with some results about M and which 
hold for ergodic chains. 

Proposition 9-58: 

Mi^ 4 - — Mij’ = -> 

aj 

kN 

= — ii. 

Proof: In Proposition 9-45, substitute for in the second and 
fourth equations. 

Lemma 9-59: In any infinite ergodic chain, for fixed i and k 
lim = lim = 0. 

;■ j 

Proof: Since it suffices to prove the result for 

But 2; < 00 , so that limy = 0. 

Proposition 9-60: In any infinite noncyclic ergodic chain, for fixed i 

lim ^Ay = 0. 

; 

Proof: 

lim *Ay = lim T by Proposition 9-55 

;• j k 

= 2 by dominated convergence 

k j 

= 0 by Lemma 9-59. 

Proposition 9-61 : for i in E. 

Proof: By Proposition 6 - 16 , 

ttyjSfi = for i in E. 

Summing on j gives 

3 3 
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Thus Proposition 6-24 is the assertion that all sets are small in an 
ergodic chain. 



Lemma 9-62: If then 



and 



- 1^7’ 



s. 



— Sj^ — /§iy 

= Mi, 












Proof: The first result is a restatement of Proposition 9-58. Since 
we have /S^y = S^j-; iS^y = Sji by definition. Finally, the 
last two results follow from the identities 

«y 

which are consequences of Theorem 9-49 and Lemma 9-25. 



From this lemma we see that 

+ Mj,), 

a formula which gives a means of computing M from quantities in P. 
Moreover, from Proposition 9-61 we have 

SO that the lemma gives, on multiplication by Sji, 

j-Kf _ 

if,y = M, = «i-^^i.(i.;> 

at 

or _ 



Proposition 9-63: In any infinite noncyclic ergodic chain 



and 



Mii 

lim = 0, 
lim Jf jy = 4-00 



lim ^ ^ 



4- 00. 
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Proof: By Lemma 9-62, 

% 

By Proposition 9-60 the numerator tends to 0 and the denominator 
tends to 1. Hence the first assertion follows. Therefore, for all but 
finite many states j, 

2M„ > Mji + M„ > M„ = — • 

oc,- 



Since 2 < oo, ot^ ^ 0 and Mij -> oo. Finally 



(C,, - ^ 






= ilf,, - - %{M,, + Jf,,) 



M,., 






The factor in brackets tends to 1 since Uy 0 and 0. Thus 

the third assertion of the proposition follows from the fact that 

M^j 00 . 



Corollary 9-64: In any infinite noncyclic ergodic chain, C ^ G, 



Proof: If (7 = (?, then {C^j — Gi;)/ay = 0 for every j. 



On the other hand, there are many finite ergodic chains with C = G. 
An example is 

(I — a a 



P = 



1 



for 0 < a < 1. 



4. Classes of ergodic chains 

Let P be a recurrent chain. In this section we shall investigate the 
finiteness of the rth moments of certain random times and obtain 
formulas for these moments. Let 

Up = 

k 

These quantities are rth moments of the first passage times, the return 
times, and the equilibrium first passage times, respectively. 
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Let 0 be a distinguished state and write 



{0} {0} 

{0} /Poo U\ 

{0} I R q) 



and 




{ 0 } 

d). 



The chain obtained from P by making 0 absorbing is an absorbing 
chain and has N = SiS its fundamental matrix. The time to 

absorption a in the chain is the same as the time tg to reach 0 in the 
chain P. As in Section 8-8 we let be a column vector indexed by 
the transient states of °P and satisfying 



ar' = M,[V] = 



Since = 0, Proposition 8-68 enables us to compute any column 
of ilf From the relation we also have a formula for the 

computation of An easy calculation gives in terms of 
with m < r : 



k 

= Poo + 2 Po, 2 (1) ^.[* 0 ”] 

Jc^o m = 0 

= It) 



The first three propositions to follow give conditions for the finiteness 
of and 



Proposition 9-65: If r > 0, then < oo if and only if < oo. 

Proof: Since for m < r, < oo if and only if < oo. 

Multiplying the inequalities of Corollary 8-69 through by U gives 

< cUNa^^-^^ < dUa^^\ 

For j / 0, 

(C/iV), = 2 Po.°/V./ = '>iVo, = ^'- 

kto “o 

Thus 

= — 2 

^0 ; 0 ^0 
and < 00 if and only if 
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Lemma 9-66: For any two distinct states i and j, let = 

be the additional time needed to reach j after i. Then for any 
powers r and s 



Proof: 

1? ,] = 2 = m A tij = n] mV 

m.n 

= 2 = n\ti = m] mV 

m.n 

m.n 

by the strong Markov property 

= 2 = m] m’-|'2 Prj[t, = w]w®\ 

m \ n / 

= M„[V] M*[t/]. 

Proposition 9-67 ; The vector 6^’*^ is finite-valued if and only if is 
finite-valued. 



Proof: By definition ¥p = Let t = min (t^, Ij) for i ^ j, 

and let u = ty — t (or 0 if t = oo). Then ty = t -h u > u, so that 



bp > My[u^] 

= 2 Mfc[ty^] by Theorem 4-11 

k 

> = i] M,[ty’^] 



Since > 0, if bp < oo, then JUp < oo for all i. 

Conversely, suppose that is finite-valued. Since ty < -f- y 

for any state i / j, 



bp < My[(t, + t,y)^] 

=10 



= 2 (") 
m = 0 W/ 



7"] 






by Lemma 9-66. 



Hence 6^*'^ is finite-valued. 
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Proposition 9-68: If < oo for some state 0, then b^^ < oo for 
all j. 

Proof: The proof is by induction on r. For r = 0, = 1, and the 

result is trivial. Suppose that the result holds forr<n and that 
joi + i) ^ QQ Xhen b^J^ < oo for r < n, so that bp < oo for r < n by 
inductive assumption. By Proposition 9-67, is finite-valued for 
r < n, and by Proposition 9-65, cP < oo for r < n. Thus for j / 0 

k 

^ 2 “fc + * 0 .,)”] since < to + *o.; 

k 

k r = 0 / 

= 2“'^2 

k r = 0 / 

= 2^ < (X). 

Therefore + < oo by Proposition 9-65. 

Finally we show the connection between the moments in P and those 
in P. 

Lemma 9-69: in a recurrent chain. 

Proof: By Lemma 6-34, for n > \, 



pin) _ V p(k)pin-k) 

^00 — Z , ^ 00^00 

k=l 

By induction we see that is a function of for k < n\ similarly 
is the same function of P^q^q\ Since P^J'o^ = P^Jo = 

Proposition 9-70: Jyir) _ J(r) ^(r) ^ ^(r)^ 

Proof: By Lemma 9-69, 

= Mo[to1 = 2 = 2 ^'^00 = Mo[lo1 = 
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For the second assertion, 

Pr^[to = ^] = 2 - lO 

= 2 ^O^Okm - 1 * • -^kzki^kiU 

where each of the sums is taken over the denumerable number of 
sequences with each kj different from 0. Hence for 

m > 0, 

2 = '^] = OCq 2 ^Okn- 1 • • *Pfc2fcl 

= ao(l - 

= «o(l “ by Lemma 9-69 

= 2 “i = -m] by symmetry. 

i 

If we multiply through by and sum on m, we obtain c^q'^ = 

We thus arrive at a hierarchy of recurrent chains: 

Definition 9-71 : A recurrent chain P has ergodic degree r if < co 
but = 00 . If < 00 for every r, then P is said to have 

infinite ergodic degree. [It should be noted that = 1; hence the 
degree is always defined.] 

We may summarize our previous results as follows: 

(1) Ergodic degree does not depend on the choice of the state 0. 

(2) P is of ergodic degree r > 0 if and only if < oo but 

= 00 . (The choice of 0 is immaterial.) 

(3) P is of ergodic degree r if and only if is finite-valued but 

is infinite-valued. 

(4) P has the same ergodic degree as P. 

(5) If P is of infinite ergodic degree, then and are finite- 

valued for all r. 

For example, null chains have ergodic degree 0 and ergodic chains 
have ergodic degree at least 1. We shall see in Section 6 that the basic 
example may be of any degree r = 0, 1, 2, . . . , oo. 

Proposition 9-72; Every finite recurrent chain has infinite ergodic 
degree. 

Proof: For a fixed state 0 the result follows by induction on r from 
Propositions 9-68, 9-67, and 9-65. 
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5. Strong ergodic chains 

A strong ergodic chain is a recurrent Markov chain of ergodic degree 
2 or greater. Every strong ergodic chain is ergodic, and by Proposition 
9-72 all finite recurrent chains are strong ergodic. By Proposition 
9-65, P is strong ergodic if and only if aM is finite-valued. If P is 
strong ergodic, then so is P. By Proposition 9-70, aM = aM. 

In this section P is a noncyclic strong ergodic chain and a is chosen so 
that a1 = 1. 

Proposition 9-73: If P is strong ergodic and if/ is a bounded function, 
then 6r/is finite-valued. If, in addition, af = 0, then /is a charge and 
its potential g satisfies g = —Gf. 

Proof: If |/| < k^, then 

G\f \ < kG^ = k{]^^'^D-^)^ = = k{aMY < 00 . 

Hence G\ f\ is finite-valued. If af = 0, then / is a charge with potential 
-6/ by Theorem 9-50. 

Definition 9-74: For any noncyclic ergodic chain P, a matrix Z is 
defined by Z = 2^=o whenever that sum exists and is 

finite-valued. 

Proposition 9-75: Z exists if and only if P is strong ergodic. If P is 
strong ergodic, then Z = A — G(I — A). 

Proof: Suppose P is strong ergodic. Since a(I — A) = 0, each 
column of / — A is a charge with potential 

-G{I -A)= 2 

n = 0 

by Proposition 9-73. By induction we verify that for n > 0 

P^ - A = (P - A)^ 

and hence P^(I — A) = P^ — A = {P — A)^. Therefore 
-G{1 - A) = I - A + 2 PV - A) = I - A + 2 

n = 1 n = 1 

= - J + 2 (P - A)^. 

n = 0 

Hence Z exists and equals A — G{I — A). 
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Conversely, suppose Z exists. Then 

(aM)ja^ = (ai/D-i),- = (aC),- 

< lim inf ^ ~ Fatou’s Theorem 

n k 

= lim inf — na^) 

n 

n - 1 

= lim inf ^ {P^ — A)jj 

^ m = 0 

= lim inf 2 [(-P - ^ 

^ m = 0 
= Zj^ - aj. 

Hence aM is finite- valued, and P is strong ergodic. 

Proposition 9-76: If P is strong ergodic, then 

(1) dual Z = Z. 

(2) Z^ = ^ and aZ = a. 

(3) Z(I - P) = I - A = {I - P)Z. 

(4) Z(/ - P + ^) = / = (/ - P + A)Z. 

{5) Z = A - G{I - A) = A - {I - A)C. 



Proof: For (1) we have 

dual Z = dual ( 2 (^ - = 2 

2 {p -Ar = 

n = 0 

Hence in conclusions (2) through (5) the second result is always the 
dual of the first, and we need verify only the first. Conclusion (5). 
comes from Proposition 9-75. For (2), we have 

Z1 = [A - G(I - A)]^ = A^ - (O - GA)^ 

= ^ — G\ + GA^ since G^ < oo by Proposition 9-73 
= 1 - G1 + G*1 

= 1 . 



Z{I - P) = {A - G{I - A)](I - P) 

= i-G + GA){I - P) 

= -G + GA + GP - GAP, 



For (3), 
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since by Proposition 9-73 all terms are finite-valued, 



= -G + GA + GP - GA 
= -G(I - P) 

= I — A by Corollary 9-51. 
Conclusion (4) follows directly from (2) and (3). 

Proposition 9-77 : In a strong ergodic chain 

C = EZ,, - Z 

and 

M = (EZa, - Z)D. 



Proof: The second assertion follows from the first, since C = MD~^, 
For the first we have 





Ztj - aj = lim y (P" - A)„ 




O 

[" 




= lim — {n + !)«;•] 


and similarly 


= lim - (n + l)aj 


Hence 


1 

II 

3 

1 

'-•3 

II 



The next proposition shows that Z may be used as a single potential 
operator in place of both -C and -G. Since (dual Z) = Z, duality 
takes as simple a form as for transient potentials. Beginning in 
Section 8 we shall develop an operator — K which exists for all normal 
chains and which has properties similar to those of Z. 

Proposition 9-78: In a strong ergodic chain, Z may be used as either 
a right or a left potential operator. 

Proof: li g = -Gf and af = 0 , then 

g = [A- G(I - A)]f = Zf 

by Proposition 9-75. The result for signed measures is dual. 

The operator Z is used in Kemeny and Snell [I960] in the analysis of 
finite recurrent chains, which are all strong ergodic. A number of 
quantities associated with recurrent chains are computed in that book 
in terms of Z. The proposition to follow is a sample. 
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Proposition 9-79: If P is strong ergodic, then 

(1) = ^ - 1. 

(2) Z,-j = 

(3) 



Proof: From Proposition 9-77 and 9-76, 

aM = (aEZ^, - aZ)D = (EZ^, - a)D = EZ^,D - V, 



and (1) follows. For (2), 

+ 1 = 

Finally by (1) and Proposition 9-77, 



^aj 




“i 



1 = i0'„( - 



6. The basic example 

The basic example P is recurrent if and only if j8„ 0. Then j8 is 

regular, and we choose a = j8. First we consider the case where P is 
null, that is, where 2i A = +oo. 

Since a null chain with high probability will be outside a given 
finite set after a long time, the finite set must be re-entered from the 
left. Hence if E is finite. 



Af 



In particular. 



1 if i is the first state of E 
0 otherwise. 



ri if j < i 
\o if j > i. 



Since 



iN.. = _ 

'' 1 - 



1 



^ ifj<i 

Pi 

1 if j > i. 



we have 



if j < i 

= S = ‘A,%y = <^^‘ 

[o if j > i. 
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To compute (7, we note that the reverse chain P enters finite sets from 
the right. If E is finite, 



Hence 



Since 



and 



jj 






1 if i is the last state of E 
0 otherwise. 



XV JJ, 



ri if j > i 

\o if j < i. 






1 if ^‘ > i 

0 if j < i 



Pj 



J if j < i 
Pi 

0 if j > i. 



Note that C — G and that — C or —G is obtained if, in the transient 
case, we let jS 00 = +oo in the formula for N. 

Let us consider for infinite sets. For convenience we assume 
that 0 is in E. The probability of entering E in the long run at a state 
j > 0 is no greater than the probability of being between 0 and j in the 
long run, and hence Af = 0 for j > 0. For any state k, let k' be the 
next state in E (with the convention k = k' if k e E). Then 




Therefore the last term must be 0 for a small set. 

We shall use this criterion to give an example of an ergodic set which 
is not a small set. Let \ < p < \ and 



{ p if i is a power of 2 
1 otherwise. 



and let E be the set of all powers of 2 together with 0. Then 
fl if ^ = 0 or 1 

A = ^ 

if > 1 and 2^ ^ < i < 2^. 

Since 2 A = ^ -f = oo if p > J, the process is null. 
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On the other hand, = 1 + = since ^ < 1; hence E 

is an ergodic set. Now 

= 1, 

k Pk ic 

SO that = 0 and E is not small. 

For this example the bounded function gr = 1 is regular outside of E 
and has \^g = 0, but P^g does not tend to 0 and hence g is not a 
potential. Thus Proposition 9-42 fails if we require that E be ergodic 
but not necessarily small. 

Moreover, the same process and the same set give an example of a 
function / with support in E and having af = 0 and Gf finite-valued 
which is such that —Gf is not a potential. For any function/. 



ri j = 0 

If we let 

^0 for i = 0 and for i ^ E. 
f^ = < — I for i = 1 

1 for other i g E, 
and specialize to the case p = ^, then 



and 



a/= -1 + 2 2-" = 0 
n > 0 



(-GA 



0 for i = 0 or 1 
2 otherwise. 



By Corollary 9-17 the function 

(2 for i = 0 or 1 
[0 otherwise 

is a potential. Since the sum of — Gf and the potential gr is a constant 
vector and not a potential, —Gf cannot be a potential. We should 
note that / is not a weak charge since 

(Cf)i = (Of), = 2 /.• = 

j>i 

With minor modifications of the above example, we can make 
A^1 = Aq assume any value between 0 and 1. For example, to get 
^0 = P with i < p < 1 , redefine E to consist of 0 and all states of the 
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form 2^+1. To get values of Af < redefine the process (and the 
set) so that the process can move from i to 0 only when i is a power of n. 
Now we consider the case where P is ergodic. Let 



and define 



i-1 

= 2 

fc = 0 



Then (Tq = 0 and doo = ^oo is finite for all ergodic basic examples, 

and a,. = 

The formula for still applies, but = aB^ and A^ is positive 
on all of E, In particular, if i < J, 



If i > j, then 
Therefore 






i Q 

'Aj = 2 «fc = 2 = ij - ihj- 

k k=i+l Pk 

‘Ay = 1 - ■'Aj = 1 - (i - j)ay. 

(j - i)aj if j > i 

^ + ( j - if j < i. 

To get C, we first compute O. If f > j, 

% = 2 “k '-^ki = 2 “k + 2 “k '^Oj 
k k=i k<j 

" i ['!/> 4 - 1:)] 

= r. [*'’■ - ■’<> + - 1 ;)] 



If i < j. 



Thus 



O’ 00 OC; O^oo 



‘1. = 1-4 = 1-^ + 



Gii = 



«y 



«i CToo O-oo 



a, a„ 



if j < i 



1 , “y -c ■ 

IH — n J > i, 

«i (^00 
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and 



Also 



Ofy - — Oryi - 



«i O-oo 



if j > i 



«i O-oo 0£j 



if j < i. 



((i-j)cc^ ifj<i 

l(^ - + 1 if j > *• 



In an ergodic basic example, C is never equal to G. For suppose 
Cqj = Gqj for all j. Then 



or 

= jPi 

for all j. By induction, j8y = + i for every j, in contradiction to the 

fact that )3y must tend to 0 in a recurrent basic example. 

In an ergodic basic example, we have 



M 



ij 



'1 




1 


2l 










<^00 




1 




1 




1 


— 


— — 




H 


l«y 






(^00 





if j > i 



if j < i, 



since = Cfy/ay. 

It is clear that is finite if and only if Mo[to^] is finite. In other 

words, ergodic degree is independent of the state. On the other hand, 
Mo[to^] is finite if and only if Mo[fo^] is finite and if and only if 
is finite. But trivially, = (l/<^oo) 

Hence P and P are of ergodic degree at least n if and only if 2^ 
is finite. The chain with = (i/(i + l))^'*'^ has = (A; + 1)"”“^; 
for this chain 2fc ^ while 2fc the chain is of 

degree n. To obtain a chain of infinite degree, let Pi = p for every 
i > 0; such a chain represents “repetition of a single task.” Then 
= p^ and 2fc < 00 for all n. 

Turning to ergodic potentials, we know that if a/ = 0 and if Gf is 
finite-valued, then / is a charge with potential g and 



gi = -{Gf)i = 2 «//>• 

;• j<i 

Hence if a/ = 0 and if 2; J^jfj is finite, then / is a charge and its 
potential g satisfies g = — Gf. 
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We can show as follows that a/ = 0 is not a sufficient condition for/ 
to be a charge, even in an ergodic chain. Let A be a function such 
that A > 0, aA = + 00 , / = (/ — P)A, and af = 0. For example, 
choose Pf = (H(i + 1))^^^, so that = (i + and the chain is 

ergodic; then A^ = Vi + 1 has all the required properties. For any 
such function A 



(/ + P + . • • + = (/ - P^)A 

and, by Fatou’s Theorem, 

lim inf (P^h) > Ah = +oo. 

n 

Hence / is not a charge. 

For ergodic chains, potentials among all integrable functions are 
characterized by the fact that they have integral zero — that is, = 0 
or = 0. We shall conclude this section by constructing a non- 
integrable potential. The chain we use will be the reverse of the 
fixed-task example with p = Then and 

= 1 

Po. = + i 

Let V be a row vector; the necessary and sufficient condition on v that 
V be a potential is that vP^ — > 0 and that \y{I — P)]1 = 0. Now 

p<o°; = So, and p<v = 2 

k k 

= «r 

If i > n, then Pjj>^ = + ^J^d if i < n, then 

pin) _ V pH)pin-i) _ pin-i) _ ^ 

ij ^ ^ik-^kj ^Oj ^j' 

k 

Therefore 

if i < n. 

Now (i'P'^)j = Thus vP^^ ^ 0 if and 

only if 

n 

lim V V. = 0. 

n-> 00 i = 0 

On the other hand, we are trying to construct v so that |i^|1 = +oo, 
and hence the series 2i°^o niust converge conditionally to 0. 
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The condition \y{I — P)]1 = 0 imposes more restrictions on v. 
We have 

[v(I - P)]j = {Vj - + - OCyl/Q, 

so that \y{I — P)]1 is well defined if 2 k; ~ ^y + i| < And if 

2 Vj - ^'y + il < + 00 , then 

\y{I — P)]1 = vq — lim Vj — V ajVQ = — lim Vj, 
i j j 

We conclude that i/ is a nonintegrable potential if: 

(1) 2i^o converges conditionally to 0. 

(2) 2 Vi - + < +00. 

If vq = 0, these conditions are necessary and sufficient. It is easily 
verified that the sequence 



0 , 1 , -1 



4 ’ 4 > 4 ’ 



~ h 



i i i 

95 95 9 



satisfies conditions (1) and (2). 



i i i i_ 

95 95 95 165 

_i_ _i_ _i_ i_ 

165 165 165 1 65 *‘* 



7. Further examples 

Example 1 : Independent trials process. 

In an independent trials process P^y = pj independently of i, where 
Pj > 0 and 2 Pj = + such a chain = P since 

p\r = 2 = 2 = p^ = Pa- 

k k 

It follows that P is recurrent and that A = lim P^ = P; the chain is 
noncyclic ergodic and «y = Pj. In addition, 

Z = I + 2 (P" - A) = I, 

n = 1 

SO that P is strong ergodic. 

If af = 0, then / is a charge with potential g = Zf = f. If /x1 = 0, 
then /i, is a charge with potential v = pZ = (jl. 

We have 

ProPo = *:] = (!- Pof~'^Po- 

Hence P is of infinite ergodic degree. To compute the if -matrix, we 
note that 

Pr,[ty = fc] = (1 - 

M„ - P, 1 1(1 - ft)*- - ft(^,) - j; 



and therefore 
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Example 2: Reflecting random walk. 

We return to the reflecting p-q random walk of Example 1 of Section 
6-7. We first determine its ergodic degree. From 0 the chain returns 
to 0 either in one step or in an even number of steps, say 2n, and 

Pro[to = 1] = g. 



Each path returning to 0 in 2n steps has n steps to the right and n to 
the left. Thus the probability of such a path is (pq)^. From Feller 

(2n - 2\ 1 

[1957], p. 71, we see that the number of such paths is I i ) ^* 
Therefore 

Pro[*o = 2«]=(^”rf)^(Mr. 



We know from earlier results that the chain is recurrent if and only if 
p < The moments of the return time to 0 in this case are 

QO — 2\ 1 

Mo[t5] = ? + 2 n - \ )n 

By Stirling’s formula, for large n 

( 2n — 2\ \ 

^ j j “ {mY c(4:pq)^-v/~^'^ 



where c is a constant, li p = then 4pg = 1, the series converges 
only for r = 0, and the chain is null. But \i p < then 4pg < 1 and 
all moments are finite. We may summarize as follows: 

ftransient 

^ J >, then P is < null 

p < \] [ergodic of infinite degree. 

If p < J, we can find C^j = M^^aj from the calculation of in 
Section 6-7. The reflecting random walk satisfies P = P, and hence 

Gu = = ^ = a,M,, 

If ^ the process is with high probability far from state 0 after 
a long time. Hence *Ay = 1 ifj > i- Thus 

" ” to if j < i. 
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We may compute Wyy for J > i as follows. For j > 0, 

= h + 



= -^ + from Section 5-4 

2 2 j 



= 1 - 



1 



Therefore, = 2j and ^Njj = = 2(j - i). Thus 

r 2( j - i) if i > i 

[0 if J 

To find the C matrix, we note that 



Cij = — Gji since P = P 



= Gyi 



since a = 1^. 



In the case p = ^ the condition «/ = 0 is the condition fj = 
Then 

(m = 2 2 O' - »)// 



;>i 



= 2 2 0 - i)fi + 2 2 (* - 

i j<i 

= 22jiA + 2 2 (i -i)/,- 



;•<< 



Now 



{Cf)i = (G*7)y = = 2 2 m - i), 

i<J 



and the right side is always finite. Thus if Gf is finite-valued and 
af= 0, / is a weak charge. That is, if 2y fj = 0 and if 2y j \fj\ < 
then 

9i= 2 J, (i - M 



j<i 



is a bounded potential. 



Example 3: Sums of independent random variables on the line. 

We consider one of the chains covered in Proposition 5-22; they 
represent sums of independent random variables on the line with 
finitely many fc-values. If the mean of the A;- values is zero, the process 
is recurrent. And since a = 1^, such a chain must be null. 
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In a recurrent process of this type 

= [N\f - N\^^] = [Nfj^ - 

It follows that G exists if and only if C exists, and if they both exist, 
then C = G, We are going to show that for this process Uy does 
indeed exist for all i and j and hence the process is normal. 

Noting that 

‘A, = 

n k 

we write for fixed N 



k k=-N k<-N k>N 

Suppose for the moment that the two limits 

= lim 
+ 00 

and 

lim 

- 00 

exist. Choose N large enough so that 

< e 

and 

< € 

for k > N\ Then 



= i 2 

k k=-N \k<-N ] 

+ 2 nv) + 2 

\k>N J k 

where 

fO for —N<k<N 

= V 

otherwise. 



Since \€f^\ < e, the last term on the right is less than € in absolute value 
for every n. Moreover, the first term on the right tends to zero with 
n, since P is null. Finally by the Central Limit Theorem (Theorem 
1 - 68 ) 

lim 2 ni' = 2 = I- 

n k< -N n k>N 



Therefore exists and satisfies 



‘Ay = i‘P-oo.y + 4 ‘^ + 00 .; 
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The missing step in the argument is the proof that j and 

j exist. We shall now show that j exists; the other proof is 
similar. Since ‘Ay = and °A_y = ^Aq = 1 - °Ay, it suffices to 

prove that _ooj exists for j >0. If is the largest A;-value and 
E = {0, . . . , v}, then 

seE 

since neither 0 nor j can be reached from the negative side except 
through E. If we can show that = lim;,^.^ jBf ^ exists, then 

since E is finite, we will have 

seE 

Thus form the ladder process P^. Then p^ = 0 unless 0 < j < v. 
By the special choice of E we have + Letting 

fj = Vt we see that Uq = \ and 

n n - 1 n - 1 

2 Fk ^ -{n-k),0 ~ 2 Pn-k^ ~k,0 ~ 2 fn-k'^k' 
k = l k = 0 k = 0 

Hence by the Renewal Theorem (Theorem 1-67), we have 

lim Htn.o = A-’ 

n-*oo /X 

where = 2jP^- But and 

Htk.s = + 2 

j<S 

Since we can prove by induction on s that 

(P'^)?^oo.s exists. Therefore ^H_^j exists, and P is normal. 

We conclude this section by treating the case of the one-dimensional 
symmetric random walk, in which p_^ = p_^_^ = ^. For j > 0, it is 
clear that j — 0 and j = 1. Hence ‘Ay = \ and G^j = 

\ in agreement with Corollary 9-29. For j > 0, °iVyy is the same 
as for Example 2 with p = since the two processes stopped at 0 are 
identical. Hence °iV^yy = 2j. If j > i, then 

W,, = = 2(j - i), 

whereas if j < ^, then Wyy = = 2{i — j). Hence 

(?<, = iw,, = \i-j\. 

Thus the potential operator is the absolute value of the distance, just 
as in classical one-dimensional potential theory. The conditions 
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a/ = 0 and Gf finite are the conditions 2; /; = ^ jl/il < 

Since Cf = 6?/, we see that if ^ fj = 0 and 2 j 1/;| < then 

9i= -2 1 ^ - M 

j 

is a bounded potential. 

8. The operator K 

We introduce a new matrix to serve as potential operator for re- 
current chains. The operator K will combine many of the properties of 
C and G and will have the single drawback that some of its entries may 
be negative. Our procedure will be first to tie the if -operator into our 
present notion of potential, then to define in terms of K the recurrent 
version of capacity, and finally to introduce so-called generalized 
potentials, in which we do not require total charge zero. With the 
generalized potentials we shall be able to prove analogs of the classical 
potential principles. 

Let a distinguished state 0 be specified. 

Definition 9-80: The iC-matrix is defined by 

= lim fe- - 

n->oo L J 

whenever the limit exists. 



Lemma 9-81 : If the indicated entries of C and G exist, then 

+ (®0i ^Oj) 



and 



K,, = G,, + (C,o - G,o) 



Proof: 



K^j = lim 



= lim 



^0 



^0 



- lim [NfP - + lim 



— (^ 0 ; “ ^Oj) + ^ij- 

By Proposition 9-45, 






0 ; 






C,, + (Go,- - Co,) = G,, + (C,o - G,o) 



and hence 
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Lemma 9-82: K exists if and only if P is normal. 

Proof: If P is normal, then C and G exist, and hence K exists by 
Lemma 9-81. Conversely, suppose K exists. Since = C^q and 
= 6?oy, Cio and G^j exist for all i and j. Thus P is normal by 
Theorem 9-26. 

Lemma 9-83: i? = dual K. 

Proof: Since G = dual C and C = dual Cr, we have 

= (a;/oCi) G^^ + {C jq - Gj^) ^1 

= C,, + (Go; - Coy) = ifo-. 

The fact that R = dual K is the key property that the K matrix 
has and the C and G matrices lack. It is what is behind our first 
important result. 

Theorem 9-84: Let 

V = lim 4- P + • • • + P^)] 

and 

g = lim [(/ + P + • • • + P")/] 

be potentials of weak charges in a normal chain P. Then v = —fiK 
and g = — Kf. If Y is any matrix such that for all potentials v and g 
of finite support v = and g = Yf, then Y = —(K k{^a)), where 

is a constant. 

Proof: By Theorem 9-31, v = — fxC and g = — Gf. But from Lemma 
9-81 and the fact that /x1 = a f = 0, we see that —fxK = —fiC and 
— Kf = — Gf. Hence K has the desired property. If Y is an operator 
that serves for charges /x, then jjuY = —fiK, so that /x(F 4- K) = 0. 
Taking /x to be the row vector with 1 in the ith entry and — 1 in the ^*th 
entry, we see that /x is a charge and that the ith. and Jth rows of Y K 
are equal. Hence Y K has constant columns. A similar argument 
with potential functions shows that Y K has rows proportional to a. 
Therefore Y K = —k{^a) for some k. 

Actually at any time when v = —fxC or g = —Gf, we may use K 
in place of C and G. Thus K serves for both functions and measures. 
And the theorem shows that the only other two-sided potential 
operators differ from — A by a multiple of 1 a. 
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We note from the definition of K that if (7 = (?, then K = C = G. 
Thus in the classical cases (the symmetric random walks), the three 
operators coincide. 

We introduce the notation 

// - 0 \ 

= %lcci, If = ^f/a^ and = 1 ^ ^1* 

Thus = dual and F = dual 

Proposition 9-85: If P is normal and is a finite set, then 
(-K)W^ = - 1A^ 

and 

W^{-K) = - Fa, 

Proof: Lemma 9-37 proves that a column of — 1 is a potential 
with charge the corresponding column of W^. The first result then 
follows from Theorem 9-84. The second result follows by duality 
from Proposition 6-16. 

Corollary 9-86: If P is normal and ^ is a finite set, then 
-K^(I - P^) = I - 1Af 

and 

(/ _ P^){-K^) = I - Ifa,. 



Proof: Restrict the equations of Proposition 9-85 to square matrices 
indexed by the states of E. 

We shall see shortly how Corollary 9-86 may be used to compute 
P^ from Ke- Although we have the formula P^ = T + UN R for 
P^, this expression is not of practical value for infinite chains, since N 
is indexed by the states of E. On the other hand, can be com- 
puted without finding all of K, and hence P^ can be calculated from 
for finite sets E by using only finite matrices. 

From the fact that — iT is a two-sided potential operator, we see 
from the proofs of Proposition 9-85 and Corollary 9-86 that 

(/ - P^)Ce = -/ + 

and 

Ge{1 - P^) = -1 + 1A^. 

We turn now to a discussion of capacity. Throughout the remainder 
of the section we shall assume that P is a normal chain. 
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Proposition 9-87 : For any finite set E there is a constant k(E) such 
that 

= k{E)aE and KeIe = HE)^. 

Furthermore, 

k(E) = X^KeII and k{E) = k{E), 

Proof: In Corollary 9-86, multiply the first equation by Ke on the 
right and the second by on the left and equate the results. Then 

1 (^e^e) — (^e^e)^e- 

Thus for some constant k(E), we have KeIe = k{E)^ and Xe^e = 
k(E)aE. Multiplication of = k(E)aE on the right by Zf gives 

k(E) = XeKeIe, since a^Zf = 1. The dual of this equation is k(E) = 
^fi?£;Zf, and thus k{E) = k(E). 

Definition 9-88: For a finite set E the constant k(E) such that 
KeIe = k{E)^ is called the capacity of E. 

Just as the iC-matrix in general depends upon the state 0 selected, 
so capacity in general is a function of the distinguished state. If 
E = {i}, then Xf = and from Proposition 9-87 we see that 

^ii — 

or 

^({^}) ~ i^Oi ~ 

In particular, k({0}) = 0. 

If we form a K' matrix by using a distinguished state O', then 
^ij - Kij = - Cq^j) - (Gqj- - Cqj) 

— (C^OO' “ ^00')^jl^0' 

by Proposition 9-45. But K = K' k^ahy Theorem 9-84. Thus 
k = ^0° -— ^00' = ifc({0'}) 

CCQf 

and 

K = K' + jk({0'})1a. 

Therefore, from Proposition 9-87 we see that 

k(E) = k'(E) -t- k({0'}) 

for every finite set E. Thus capacity is determined up to an additive 
constant. If we let E = {0}, we find that 0 = A:'({0}) -h A:({0'}) or that 
Note that since A:({0'}) = (Gqo' ~ C! oo') I ■> when 
C = G the capacity does not depend upon the choice of 0. 
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In general, there is no reason why k{E) should be positive. However, 
if 0 is in E, then 

k(E) = = {Gl^)o > 0, 

since Kl^ is constant on E. The dual expression is 

a^k{E) = (A^Oo. 

In the following discussions we shall restrict ourselves to sets which 
contain the reference point. We accordingly denote by the 
collection of all finite sets which contain 0. 

Proposition 9-89: If E is in jSfgj then k{E) > 0 if and only if there is a 
j in E such that °Ay > 0 and > 0 (or, equivalently, there is j in E 
such that > 0 and Af > 0). 

Proof: For E g J5fo» 

k(E) = 2 

jeE 




Thus k{E) > 0 if and only if there is a non-zero term. Since > 0, 

the first assertion of the proposition follows. The equivalent condition 
follows by duality. 

Proposition 9-90 : If P is noncyclic ergodic or if P is a symmetric 
sums of independent random variables process, then k(E) > 0 for all 
sets E in with more than one point. 

Proof: If P is ergodic, then A^ = aB ^ by Proposition 9-55. Hence 
Af > ocy > 0 for j G E. Since P is also ergodic, °Ay > 0 for j i=- 0. 
Therefore, if E contains a state other than 0, then k{E) > 0 by 
Proposition 9-89. 

If P is a symmetric sums of independent random variables process, 
then ^ 3 ^ h by Corollary 9-29. If Zf > 0, then 

k[E) > ifyo^o > ^ other state j g E . If = 0, then if > 0 for 

some j G P, and k{E) > KQjlf > 0. 

Proposition 9-91: For any set E in exists if and only if 

k{E) > 0 . If k{E) > 0 , then 
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Proof: If k{E) = 0, then = 0 and Ke~^ does not exist. If 

k{E) > 0, then by Corollary 9-86, 

= / — 1 A| -h 1 Af by Proposition 9-87 
= I. 

From Proposition 9-91 we obtain the following expression for in 
terms of and a^. 



Corollary 9-92: If is a set in jS^q with k{E) > 0, then 



= I + 






Proof: By Proposition 9-91, 



77TTT Af . 

k(E) ^ ^ 



Multiplying through on the right by 1 gives 



K^-^^ = 



IE 

k(E) 



If we multiply both sides by a^, we get 



k(E) = 



1 



XEK^-^^ 



Hence 



By duality. 



JE_ 









and substitution for k{E) H%X% gives the desired result. 



We now begin a series of lemmas which lead to the fact that k{E) is 
a Choquet capacity in the sense of Definition 8-38. 



Lemma 9-93: If ^ is a finite set, then 

lim = Ha - 
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Proof: By Proposition 6-17, 



£E^in) __ ^(n) _ Ej^pn + 1 _ 

Now 

dual = P»^ + i ]jy Proposition 9-47, 

so that 

EJ^pn^l E^^ 

Lemma 9-94: If P is a finite set, then 

B^K = K + - ^da 

and 

= k + ^ ^v. 



Proof: Each row of — / is a weak charge. By Theorem 9-84 
and Lemma 9-93, 

- 1)K = -lim (B^ - - Ha. 

n 



The second result is dual. 



Lemma 9-95 : If P is a finite set, then 

X^K = k(E)a -f 

and 

Kl^ = k{E)^ -f H. 



Proof: Multiply the equation of Lemma 9-94 on the right by to 
obtain 

B^{Kl^) = Kl^ + - H(al^) 

= Kl^ - H. 

But (KI^)e = which by Proposition 9-87 equals k[E)'\ , Hence 

Kl^ = yfc(P)1 + H. 

The other result is dual. 

Lemma 9-96: If E, F, and L are finite sets with E and F both 
contained in L, then 

[k{E) - k{F)] = X\H - H) = - H)IF 
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Proof: Multiplying the equation of Lemma 9-95 through by A^, we 
obtain 

X^Kl^ = k(E) -h H 

and similarly 

X^Kl^ = k(F) -f- A^ ^d. 

Subtracting, we see that it is sufficient to prove that X^Kl^ = X^KF. 
But 

X^Kl^ = (XiK,)lf = k{L)aJf = k{L) 

since the support of F is in L. Similarly, X^KF = k{L). The other 
equality is dual. 

Lemma 9-97: For any sets A^. which have at least one 

point in common, 

s s^t 

+ 2 - • • • + (- 1)’' + ^ 



Proof: The left side is the mean number of times in state j, starting 
in i, before the intersection of sets is reached. We shall show that the 
right side is the mean number of times in j before all of the sets have 
been reached. The former is clearly at least as large as the latter. 
Let ny(o;) be the number of times on o> that the process is in j before 
all sets are entered. On the path w let be the set of times at which 
the process is in j before Aj^ is entered. Then ny(o>) is the cardinal 
number of ^2 equals 

2”('^s) “ 2 ± • • •, 

s 

where n{A) is the cardinal number of A. Since n(8^ n • • • n is the 
number of times in J before U • • • U is entered, the result follows. 

Proposition 9-98: Capacity is a monotone increasing set function, 
and for any sets A^, A 2 , . . A^\n 

k(A^ n • • ■ n ^,) < 2 H^i) - 2 ^ 

i 

+ 2 ^ A^kjA^) 

+ + U- • -U Jf)- 
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Proof: We first prove monotonicity. Let A C B\ then 

> ^N. 



Thus 



pn A]^ > pn 



As CO, 



> ^v. 



Therefore 



[^v - ^v)l^ > 0 . 



If L contains A and B, then by Lemma 9-96, k{B) — k{A) > 0 or 

k{B) > k{A). 

Therefore k is monotone. The other inequality of the proposition 
follows in exactly the same manner starting from the result of Lemma 
9-97. 



In the next section we shall prove potential principles for sets of 
positive capacity. It is therefore particularly annoying when all sets 
in a chain have capacity zero, and we treat this case now. 

Definition 9-99: A normal chain is degenerate if, for every choice of 
the reference state 0 , k(E) = 0 for all finite sets E, 

Lemma 9-100: If, for a single reference point 0, k{E) — 0 for all 
E e JSfo for all one-point sets, then the chain is degenerate. 

Proof: If 0' is any other reference point, then 

k{E) = k\E) + k{{V}) = k'(E). 

Hence it suffices to show that k{E) = 0 . If 0 g k(E) = 0 by 
hypothesis. If 0 ^ J^, let 0' be any point of E, and 0 = A;'({0'}) < 
k'{E) = k(E) < k{E U {0}) = 0. Both inequalities follow by the 
monotonicity proved in Proposition 9-98. 

Lemma 9-101: A normal chain is degenerate if and only if for every 
pair of states i and j either ^Xj = 0 and = I or ^Xj = 1 and = 0. 

Proof: If the chain is degenerate, then k({i,j}) = 0 if ^ is chosen as 
the reference point. Hence ^Xj = 0 or = 0 by Proposition 9-89. 
By symmetry = 0 or ^X^ = 0, and hence ^Xj = 1 or ^Xj = 1. 
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Conversely, if °Ay = 0 or ^'Xj = 0 for all states then k{E) = 0 for all 
E E JSfo (by Proposition 9-89 since Xf < °Ay). And 

m) = ^ («0> - Co,) = (“A, - ^o). 

«0 

But °Ay and either both 0 or both 1. Hence k({j}) = 0. Thus 

the converse follows from Lemma 9-100. 

The basic example and its reverse are degenerate when null. It can 
be shown that degenerate chains are all of a type similar to the basic 
example. (See the problems.) 

Proposition 9-102: If P is not degenerate, then there are two states 0 
and 1 such that, with 0 as reference point, k(E) > 0 for all sets 
containing both states. 

Proof: Choose an arbitrary point as the reference state 0. Then, by 
Lemma 9-100, there is either a one-point set or a set in which has 
non-zero capacity. 

If A:({1}) > 0 for some state, then we may choose 0 and 1 as indicated, 
since k{E) > A:({1}) > 0 if 1 6 i/. If k({l}) < 0 for some state, then 
use 1 as a new reference point, and A:'({0}) = — A:({1}) > 0. Hence we 
simply interchange the roles of 0 and 1 . 

Otherwise k(E) / 0 for some E e Then k(E) > 0. Let ^ be a 
set in JS^o with positive capacity and containing as few states as 
possible, let 1 be a state of E other than 0, and F = E — {!}. By 
minimality, k(F) = 0. Thus, by Proposition 9-98, 

0 = A:({0}) = k{F n {0, 1}) < k(F) + fc({0, 1}) - k(E), 

and 

ifc({0, 1}) > k(E) > 0. 

Hence any set containing 0 and 1 also has positive capacity. 

We conclude this section by applying our results to ergodic chains. 

Proposition 9-103: If P is ergodic, = k{E). 

Proof: Since ccQk(E) = (X^C)q, we have k{E) = {X^M)q. 

This proposition enables us to give an interpretation to k{E). Since 
A^ = aB^, k{E) = a(B^M)Q = M„[time to reach 0 after E is entered] = 
M^[to — t^]. This function is monotone in E since to — t^ is monotone 
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on each path. The capacity inequalities follow from the fact that the 
time to enter all of n sets is no greater than the time to enter the inter- 
section; hence the time to reach 0 from the intersection is no greater 
than the time to reach 0 after all n sets are entered. We also see why 
k{E) > 0, if ^ has more than one state. 

Corollary 9-104: (A^JHf)y = for all j e E, 

Proof: Choosey* e as reference point. Then 



{X^M)f = k(E) = k{E) = 



Proposition 9-105: In an infinite ergodic chain K k^ a has negative 
entries for each k. 

Proof: 

{K -f A;1a)yy = (?o; “ C'o; + 

- 

Hence 

liixi k^a)jj _ Proposition 9-63. 

Proposition 9-106: Let {E^} be an increasing sequence of finite sets 
with union the set of all states S, Then lim^_^oo if and 

only if the chain is strong ergodic. In a strong ergodic chain, k{E) = 
and hence 

Proof: k{E) = Mo,[to — t^;] and therefore 

2 aiM^Q < k(E) < if„o- 

ieE 

Hence as E^ S, k{E^) — > which is finite only in a strong ergodic 

chain. Then 

^ aO ~~ ^ aE — k(E) = k{E) = MccO ~ ^ aE' 

Hence since aM = aM. 

For strong ergodic chains the concept of capacity can be extended to 
infinite sets, and Proposition 9-103, Corollary 9-104, and Proposition 
9-106 hold for all sets E. 
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9. Potential principles 

In this section we shall assume that P is a nondegenerate normal 
chain. We shall assume that 0 and 1 are chosen according to Prop- 
osition 9-102, and we denote the collection of all finite sets containing 
both 0 and 1 as JSf . Thus capacity is uniquely defined, k{E) > 0 for 
P G jSf and exists by Proposition 9-91. 

Definition 9-107: Let / be any function whose non-zero components 
are on a finite set E. Then / is the charge of the generalized potential 
g = —Kf, with support E and total charge a/. If / > 0, then gr is a 
pure potential. Let be any signed measure whose non-zero com- 
ponents are on E. Then jx is the charge of the generalized potential 
measure v = — fxK, with support E and total charge /x1 . 

In this section g and/ will be used only for generalized potentials and 
their charges. 

Proposition 9-108 : g determines /, and if the support is in P g then 
gE determines g. 

Proof: Since P g JSf, k(E) > 0, Ke~^ exists, and 

To see that g determines / even if the support F fails to contain 0 or 1 , 
we simply let P = P U {0, 1}. 

Proposition 9-109: For any g with support in E, E e oSf> 

X^g = -{af)k(E) 

g = B^g - (af) H 

(I - P^)g, = A - 



Proof: Since = k(E)ai; by Proposition 9-87, — A^gr = (af)k(E). 

Since B^K = K + - ^da by Lemma 9-94, - B^g = -g - ^d(af). 

Since (I — P®)( - Ke) = I — Ie^e by Corollary 9-86, 

{I — P^)gE = /e ~ 



Proposition 9-110: If gr is a pure potential and E is any set in =5f, 
not necessarily the support of g, then 

-X^g > (ccf)k(E) 

g > B^g - (af) H 

(I — P^)gE ^ Se ~ 
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Proof: 



-X^g = X^Kf 

= (k{E)a + ^v)f by Lemma 9-95 

> k{E)(af) since > 0 and / > 0. 

-B^g = B^Kf 

= Kf -f ^Nf — (af) by Lemma 9-94 

> Kf- (af) H 
or 

B^g < gr + (a/) H. 

By Proposition 9-85, 

l^N\ 

W^{-K) = ( I - Fa. 

Hence 

(I - P^)g, = ^Nf - (af)l§ 

^ /e ~ 

since = 1 for i e E and / > 0. 

The recurrent Maximum Principle is the following corollary. 

Corollary 9-111 : A generalized potential of non-negative total charge 
assumes its maximum on the support. 

Proof: By Proposition 9-109, g = B^g - (af) ^d. But af and ^d 
are both non-negative. Hence g < B^g, and g^ < max;^.^^; 

Lemma 9-112: If g' is a generalized potential with support in E e ^ 
and if g is any pure potential such that g^^ < g'^, then af ' > af. 

Proof: By hypothesis and by Propositions 9-109 and 9-110, 

0 > X^(g' - g)> (-ccf')k(E) + (af)k(E). 

From this it follows that af' > af, since k(E) > 0. 

Definition 9-113: An equilibrium potential for £' is a potential with 
support in E which has total charge one and has constant values on E. 

This definition of equilibrium potential agrees with the one for 
transient chains (Definition 8-27) only up to a constant multiple, but it 
will be a more convenient definition for recurrent chains. We shall 
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discuss the effect of the change after proving that equilibrium potentials 
exist. 

Theorem 9-114: Any set E in ^ has a unique equilibrium potential. 
The equilibrium charge is P and the value of the equilibrium potential 
on the set is —k(E). For j not in E it has the value —k{E) — ^dj. 

Proof: Lemma 9-95, together with the fact that al^ = I, shows that 

— KP is an equilibrium potential and has the values specified in the 
theorem. 

Conversely, if/ is an equilibrium charge, then by Proposition 9-109, 
/b = (/ - P^)gE + (a/)/f = (I - P^)9e + Since Qe is constant 
and =1, {I — P^)gE = Thus /e = if, and the equilibrium 
potential is unique. 

If we renormalize the equilibrium potential for E so that its value is 
one on E, then its total charge becomes —llk(E). Thus if we were 
following the definitions of transient potential theory, we would define 

— llk(E) to be the capacity of E. The set function — l/fc(P) is a mono- 
tone function, as is k{E), but it does not have as nice a probabilistic 
interpretation as our choice. 

We shall now prove the recurrent Principle of Balayage. 

Theorem 9-115: If gr is a pure potential and if E e ^ then there is a 
unique pure potential g' with support in E such that 

( 1 ) 9e — 

(2) 9' < 9> 

(3) af > a/, 

(4) af' = -k(E)-^X% 

and this unique potential is B^g -h {X^gjk{E)) ^d. 

Proof: Since g' is determined by its values on E, there is only one 
potential satisfying (1), and its charge is 

f 's = ^gE 

= (I - P^)9e - Ie by Proposition 9-91 

^ “ P^)gE + {^fV'E 

^ Je‘ 

The last two steps are by Proposition 9-110. Thus g' is pure. Hence 
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a/' > a/, by Lemma 9-112, and we have (3). Result (4) follows by 
multiplying the second equation above by a. Furthermore 

g' — B^g' — (a/') by Proposition 9-109 

< B^g - (af) H by (1) and (3) 

< g by Proposition 9-110. 

Hence we have (2). The formula for g' follows from (4) and the second 
part of Proposition 9-109. 

The potential g' is, as in the transient case, referred to as the balayage 
potential of g on E. 

Next, we prove the recurrent Principle of Domination. 

Theorem 9-116: lig and g' are pure potentials, if g' has its support in 
E E oSf , and if g^ > then g > g' . 

Proof: 

g > B^g — (af) by Proposition 9-110 

> B^g' — (af') ^d by hypothesis and by Lemma 9-112 
= g' by Proposition 9-109. 

Corollary 9-117: The balayage potential of g on E is the infimum of 
all pure potentials that dominate g on E. 

Proof: If ^ is a pure potential and g^ > then gE ^ hence 
g > g' hy domination. 

Proposition 9-118: If g is a pure potential with support in E e ^ 
and total charge 1 , then 

min g^ < —k(E) < max g^. 

ieE ieE 

Proof: Assume that the first inequality is false. Then —(Kf)^ > 
— k(E) for all i 6 E. Hence we may choose a c > 1 such that 

-K(cf) > -k(E)^ = -Kl^ 

on E. Thus c = c(af) < al^ = l, by Lemma 9-112, which is a con- 
tradiction. The other inequality is proved similarly. 

We retain the same definition of energy as in transient theory. 
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Definition 9-119: If ^ = —if/ is a potential, the energy of g, denoted 

Mg), is 

I(Sf) = 1^9 = 

Lemma 9-120: For any potential g with support in a set 
Mg) = ve{I - P^)9e + m^^g)- up = p, then 
I(^) = - P^)9e - k(E){af)\ 

Proof: By Proposition 9-109, 

(/ - P^)g^ = A - 

Hence 

ve{I - P^)9e = I(^) - («/)W|) = I(?) - («/)a^g). 

If P = P, then \^g = X^g = — (af)k(E) by Proposition 9-109. 

Lemma 9-121: ve(I - P^)gE = \ Iuee ~ gjf ^ 0, and the 

value is 0 only if g is constant on E, 

Proof: The proof proceeds as in Lemma 8-54, but P^1 = 1 and 
oc^P^ = hence = 0 and = 0. Let F be the subset of all 

states of E on which g = g^. If P / P, there are states i g F and 
j e P such that Pfj > 0, since the states of E communicate. Then 

ve{i - p^)gE > \^iPij{gi - gjf > 

Lemma 9-122: If E g J5f, the energy of the equilibrium pbtential of 
Pis —k{E). 

Proof: Since g = —Kl^ is constant on P, (/ — P^)gE = 0. Since 
a/ = 1, \^g = ^^( — fc(P)1) = —k{E). Hence by Lemma 9-120, the 
result follows. 

Theorem 9-123: If P = P and PeJSf, then among all potentials 
with support in P and total charge 1 the equilibrium potential alone 
minimizes energy. 

Proof : If g has support in P and «/ = 1, 1(9^) > — k{E), by Lemmas 
9-120 and 9-121. Equality holds only if g is constant on P, in which 
case g is the equilibrium potential. 

10. A model for potential theory 

By an electric circuit we shall mean a denumerable number of 
terminals, some of which are connected by wires. The wire connecting 
terminals i and j has resistance and conductance = 1/r^y. If there 
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is no connection between i and we let = 0; also we define = 0. 
We shall assume that: 

(1) For each i, % < oo. This condition is satisfied, for example, 
if each terminal is connected to only finitely many other 
terminals. 

(2) The circuit is connected. 

From physics we have the following two laws. In the present context 
the first one may be taken as a definition of current in terms of voltage: 

(1) (Ohm) If i is at voltage and j is at voltage Vj, the current 
flowing from j to i is {v^ — Vj)Cij. 

(2) (Kirchhoff) The sum of all currents flowing into a given terminal 
is 0. 

If an outside source is attached to a certain terminal, then the 
Kirchhoff Law (2) does not apply unless account is taken of current 
flowing from the outside source. But for all terminals i which are not 
attached to any outside sources. Ohm’s Law and Kirchhoff’s Law 
imply: 

(^) (^i ~ 

If a finite set E of terminals is kept at prescribed non-zero voltages by 
an outside source and if there is a finite set F with E C F such that all 
points in the set F are grounded (kept at 0 voltage), then we shall call 
the problem of finding voltages at the points i of F — E in such a way 
that (3) holds a standard voltage problem. Note that for finite circuits 
the voltages may be prescribed at an arbitrary subset E of terminals. 

Definition 9-124: A Markov chain P with P1 = 1 is said to represent 
some given electric circuit if each state corresponds to a terminal and 
if any standard voltage problem can be solved in such a way that the 
voltage vector is P-regular at points of F — E. 

It follows from Theorem 8-41 that if P represents a circuit, then the 
solution to a standard voltage problem is unique and satisfies 

V = 

Thus the voltage at a point i of F — E can be interpreted as the 
expected final payment if the chain is started at i and stopped at 
E KJ F and if a payment of Vj is received if the process reaches state j 
of E. 

Proposition 9-125: For any electric circuit there is a unique Markov 
chain P such that P^^ = 0 and P represents the circuit. 




9-125 



A model for 'potential theor'y 



305 



Proof: We first prove uniqueness; let P represent the circuit. Let i 
and J be any two distinct states, and let E = {j} and F = Put 

a unit voltage at j and ground F. Then by (3), 

k 

Since = 0 except when is i or j and since = 0, we have 



Hence 



k 



Vi 



2 

k 



(The denominator is not zero since the circuit is connected.) Now 
since P represents the circuit, 

Vi — (Pv)i = 2 Pik'^k = Pii'^i + 

k 



Since Pa — 0 and since Vj = I, we have 



Therefore 



Pi,. 



T> 

^ij = V — ’ 
k 

and P is unique. 

Next we prove existence. Let the circuit be given, and define 






Then Pi, > 0, Pa = 0, and 






^ Pii - y 

) 2^ ^ik 



Hence P is a transition matrix with Pa = 0 and P1 = 1 . Thus let E 
and F be finite sets with E C F, let be specified, and let Vp = 0. 
We are to show the standard voltage problem has a solution. Define 
V by 

r‘\ 

V = Vf • 



0 
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Then v is regular on F — E, or 

k 

for i in F — E, Since P1 = 1 , 

2 

or 

2 - ^k) = 0. 

fc 2‘^‘m 

m 

Thus 

2 - ^k)Cik = 0 , 

k 

and is a solution to the standard voltage problem. 

Corollary 9-126: Every standard voltage problem has a solution, and 
that solution is unique. 

Proof: Existence follows from Proposition 9-125 and uniqueness 
follows from Theorem 8-41. 



Shortly we shall show exactly how general the class of chains that 
represent circuits is. But first we shall exhibit the connection between 
the currents and voltages of this section and the charges and potentials 
of Markov chain potential theory. In so doing, what we will be 
showing is that electric circuits provide a model for the discrete 
potential theory associated with the class of chains that represent 
circuits. 

In physics, current is the time rate at which charge flows past a 
point — that is, the derivative of charge with respect to time. Markov 
chains, however, have a time scale that is discrete and not continuous, 
and the proper analog of the time rate at which charge flows past a 
point is the amount of charge that moves past the point in unit time. 
With discrete time the charge moves to some point, stays for unit 
time, and then moves to another point. Hence the magnitude of the 
current at a point is equal to the magnitude of the accumulated charge 
at that point. 

Now in a standard voltage problem, current flows in and out of the 
circuit through the terminals which are attached to the outside source. 
The above considerations lead us to define the charge at terminal i to 
be the current /Xj which flows into the circuit; is taken to be negative 
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if the current flows out. By KirchhofF’s Law (2), the charge will be 
zero on the set F — E. For i in E U P the charge is given by 

k 

Before we can connect /x and v in terms of the representing chain, we 
need one preliminary result. 

Proposition 9-127 : Let P be a Markov chain which represents an 
electric circuit with conductances Cjy. Then the row vector a defined by 

k 

is P-regular, and the a-dual of P is P. 




In terms of Proposition 9-127 we can transform the equation fjii = 
Ik (^i - '^k)^ik as follows: 

fXi = Vtat - 2 

k 

= - 2 
k 

= [(dualt;)(/ - P)l. 

Hence /x = (dual v)(I — P). Since P = 



dual fjL = {I — P)v. 

Let /i = fjLijai, that is, / is the dual of the vector of charges at the various 
terminals. Then 

f = (I - P)v. 



We note that all pairs of states communicate in P since the circuit is 
connected; hence P is either transient or recurrent. But v has only 
finitely many non-zero entries, so that av is finite. It follows that if P 
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is transient or null, then v is a potential in the Markov chain sense. 
And if P is ergodic, then v — (a/)1 is a potential, in any case, / is the 
charge. 

Conversely, if, in a chain which represents a circuit, gr is a potential 
vanishing outside a finite set F with a charge / such that / has total 
charge 0 and / has support in E \J F, where E C F, then g solves the 
standard voltage problem for E and F with the specified values Qe on 
E. It is in this sense that electric circuits form a model for potential 
theory. 

We turn to the problem of classifying all Markov chains which 
represent circuits. A chain P is said to be a-reversible if P, the 
a-dual of P, equals P. 



Proposition 9-128: A Markov chain with P^j = 0 represents a circuit 
if and only if its states communicate and it has a positive regular 
measure a with respect to which it is a-reversible. 



Proof: If P represents a circuit, then it has a regular measure a and 
is a-reversible by Proposition 9-127. Its states communicate since the 
circuit is connected. 

Conversely, suppose that P is a transition matrix with the stated 
properties. Introduce the electric circuit with the states of P as 
terminals and with The circuit is well defined since 

Cii = «jPii = 0 

and since 

= «iPlJ = = Cji- 

Since the states communicate, the circuit is connected. To see that P 
represents the circuit, we note that 

“4 = 2 = 2 

k k 

Thus 



P — ^ 

■ 2 <^4.’ 



Finally we consider the problem of when the chain representing a 
circuit is recurrent and when it is transient. 



Lemma 9-129: Let P be a chain which represents an electric circuit. 
Let a unit voltage be put at 0, let P be a finite set containing 0, and let 
F be kept at voltage 0. The charge at the terminal 0 is 

Po = ao-°PoF- 
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Proof: 

k 

= 2 Poki^ ~ since is absorbing 

k 

= 2 Po.(i - 

k 

= 2 Poki'^O ~ '^k) 

k 

1 ^ 

^0 k 

_ 

~ «o 

Lemma 9-130: In any Markov chain a state 0 is recurrent if and only 
if ^Hqp 0 for some (or every) increasing sequence of finite sets F 
with union the set of all states. 

Proof: If 0 is transient, then there is a positive probability 1 — 
that the chain never returns to 0. Hence 

^Hqp > 1 — Hqq > 0 

for every finite set F, and ^H^p cannot approach 0. 

Conversely, let 0 be recurrent. Choose N sufficiently large that 
^ 00 ^ > 1 — €. Choose S close enough to 1 so that 

1 - € < 8^ < 1. 

Then construct an increasing sequence of finite sets Aq, . . ., A^ 
such that ^0 = {^} such that the probability of stepping from any 
state oi A ^ to state of A^^^ is greater than 8. Let F be any finite 
set containing A^. The probability that the process started at 0 is, 
for every n < N,m A^^ after n steps is greater than 8^. Hence 

< 1 - 8^ < e. 

Since 

1 - € < ^ 

we have 

> 1 - 2c. 

But 

^-^00 + ^ Bqp = 1 , 
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so that 

< 2e 

for all F containing Af^. 

Proposition 9-131: Let P be a chain which represents an electric 
circuit. Then the following is a necessary and sufficient condition for 
P to be recurrent: If terminal 0 is kept at a unit voltage, if all terminals 
outside a finite set F are grounded, and if F is allowed to increase to S, 
then the current at terminal 0 tends to zero. Furthermore a necessary 
and sufficient condition for P to be ergodic is that 2t.; 

Proof: The first assertion follows directly from Lemmas 9-129 and 
9-130. The second assertion follows from the fact that P is ergodic if 
and only if a1 < oo, where = 2; - 

11. A nonnormal chain and other examples 

We shall show by examples in this section that all three of the 
following conjectures are false: 

(1) Small sets and ergodic sets are identical concepts, or at least one 
of the notions includes the other. 

(2) All null chains are normal. 

(3) The existence of either C ov G implies the existence of both. 

First, we settle the independence of the notions of small sets and 
ergodic sets. We saw in Section 6 an example of an ergodic set which 
is not small, and we shall now produce a small set which is not ergodic. 

Let P be a chain with states the non-negative integers and with 
transition probabilities 

Poi = Pi = Pio for i > 0 

Pa = <li= - Pi- 

All other entries of P are 0. We impose no requirements on the 
yet except that > 0 and 2 1 • It is clear that Pqo = t 

hence P is recurrent. Since P = P^, a = is regular and P is null. 
Only finite sets are ergodic, and thus it is sufficient to exhibit an infinite 
small set. For any set E containing 0, 

Af = lim 2 P^SlB^ki = lim P^S} = 0 for j > 0, 

n fc n 
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whereas 

Ao = lim 2 

n k 

= lim P<o"o> + lim 2 

^ ^ kEE 

= 1 - lim P'o-i. 

n 

Thus any set containing 0 such that 0 satisfies Af = 1 and is 

therefore a small set. Let 

= 4“^ for 2^ — 1 < i < 2^^^ — 2. 

The Pi assume the constant value 4"^ on a block of length 2^. Thus 

= 2 = 1 . 

k = l 

Let E consist of 0 and one state, such as 2^, from each block, and label 
the representative of the A:th block in E e^^. Then 

AT-l 

pin) _ pin) , V pin) , pin) 

^OE — ^00 ^ -^Oeic -^OTn’ 

k = l 

where = {c^, + + • • •}• We can form 2^ disjoint sets like 

each differing from it in every representative selected from the Nth 
block on. By symmetry Pqt'j^ = for all such sets T'j^. Hence 

P^otn ^ 1/2^ ^^or all n. Since P^^/ ^ 0 in a null chain, 

lim sup P<o"i < 

and we must have PjfJ -> 0. 

Second, we shall construct a nonnormal null chain. We shall use 
generating functions and require the following facts: 

(1) If F(t) = 2n U" and G(t) = 2n QJ”-, then 

F(t).0(t) = 2(2 

That is, the coefficients of the series for F(t) • G(t) are the convolutions 
2 fk9n-k' 

(2) The Abel sum of the series 2n A is lim^_i- F(t). If the series 
2 fn converges, then its Abel sum exists and has the same value 
(Proposition 1-62). 

Let P be any recurrent chain; we begin by deriving a necessary and 
sufficient condition for the series 2n (AY ~ AY) defining (7oi to be 
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Abel summable. Let E = {0, 1} and define generating functions as 
follows: 



^oo(o = 2 ^oi(<) = 2 

n n 

^io(0 = 2 = 2 

n n 

P,,(<) = 2 



Hq{1) — -4oo(0 + -4oi(0 — 2 -^0^" 

n 



^i(<) = ^io(<) + = 2 

n 



<2(0 = 2 (^u - = Pii(t) - Poi(0 

n 



R(t) 



1 - ^i(0 

1 - HS) 



We note that the series defining Cqi is Abel summable if and only if 
lim(^i- Q(t) exists. We shall prove that this limit exists if and only if 
lim(_^- R(t) exists. We have 



k 

or 

^Ol(0 = ^Ooi^)P Ol(0 + ll(0* 

Also 

= Sno + 2('^io^^V + 

k 

or 

Pii(0 = 1 + + A,,{t)P^SY 

Solving these equations for Poi(0 we find 

Q{t) = Pii(f) -- P Ql(t) 

_ ^ ~ -^oo(0 ~ ^oi(0 

(1 — Aoo(0)(l ~ ^ii(O) “■ -^oi(0^io(0 

_ 1 

(1 — Aoo(f))P(0 + ^io(0 

Since Aoo(l) = < 1 since -4io(l) = ^^10 > 0. Q(t) 

exists if and only if lim^^i- R{t) exists. 

The example of a nonnormal chain will be like the earlier example in 
this section, only ‘‘doubled.” The states are 

0, (Xj, d2f ^3) • • • 

I5 ^3> • • • ) 
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and if > 0, > 0, ^Pi = and fhen the transition 

probabilities are 

^Oai ~ Pi ~ ^ail 

^aiai ~ Qi ~ ^ Pi 

^Ibi — Pi — ^biO 

Pbibi = 1 “ Pi- 



All other entries are 0. We see easily that Hqq = 1 and that a = 
is regular, hence the chain is recurrent and null. For E = {0, 1}, 



= -PS.' = 2 Pi^r^Pi for n> 2. 

i 

Therefore 



Similarly, 



Ho(t) = 'ZPi^ 2 = 2 

i n = 2 i 



HS) 



f 1 - q[t' 



and we have defined R{t) by 



R(t) = 



1 - HS) 

1 — Hglt) 



1 - q/ 



We again choose the p/s in blocks as follows. Let {U}^} and {n'j^} be two 
rapidly increasing sequences (with magnitude specified later) such that 
U}^ < n'f^ < 7ij^ + i. Let there be consecutive p/s equal to — 
1/(2^71,^), and let there be n'^ consecutive (p[Y^ equal to = l/(2^/iJ^). 
The remainder of the proof consists in showing that for suitably chosen 
{n^ and 

lim R{\ - €^) ^ lim R{\ - e;). 

n-*oo n-*oo 



We shall only sketch the argument. 

We begin by estimating -R(l — ej for large n. We have 



ffo(i - e„) = 2 

i 



PiHl - 

1 - (1 - - ej 



y Pi^ 
r Pi + 



CO 



= 2 

= 1 






V — 
= 1 



For k = n, €j^/(€j^ + e^) = Choose the sequence so that is 
negligible compared to when k < n. Then 

n - 1 

~ 2 + 2-"-2-i = 1 - 2-" + i + 2-"-i. 

fc = l 

Similarly, 

n-l 



H,{\ - o ~ 2 2 2"^ = !- 

• - ^k “T 



k = l 



k = 0 
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which differs from Hq{\ — e„) if e„ is chosen to be negligible compared 
with e'n but much bigger than 

O-n+l A 

m - ^n) ~ 



2 -n+l _ 2-«-i 3 

If f = 1 — and n is large we obtain, similarly, 



^o(l - 4) ~ 2 2-'^— J-7 ~ 2 2-'^=!- 2-" 

k = l ^ k = l 



and 



n-l 



^i(l - 4) ~ 2 ~ 2 2-" + 2-’‘-2-i 

k = l “r k = l 



= 1 



+ 2 - 



The asymmetry in j^o( 1 ~ ^n) ^i(l ~ «n) arises because the 

condition < n'^ < + i is not symmetric. We thus find 



^(1 - 4) 



- 71+1 



- 2 - 
2-^ 



Therefore, does not exist and Gqi cannot exist. In 

particular, P is not normal. 

This example has the property that neither C nor G exists. The 
reverse process has transition probabilities Pq^,, = P^^o = Pi = ^bti 
and Pia^ = p. = P^^Q. Thus P is the same as P except that the roles 
of 0 and 1 have been interchanged. The above argument therefore 
shows that C^q does not exist, and since C^q = Gqi either exists, G 
cannot exist. 

Not even reversibility (P = P) is a sufficient condition for a null 
chain to be normal. A slight modification of the above example 
provides a counterexample. Let the states be as before, and define 
Pi and p[ as above. Let 

^Oai ~ ^ aiO ~ \Pu ~ 1 ~ 2Pi^ 

^atai ^ 2Pi^ ^bibi ^ 2Pi> 

and 

P — P — 1 

Set all other entries of P equal to 0. Then P = P^, so that a = 
and P = IP^I = P. But the same kind of computation as for the 
preceding example shows that Cq^ does not exist. Thus we see that 
even a symmetric P need not be normal. 
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Third, we show that the existence of one of C and G does not imply 
the existence of the other. Again we modify the first example of a 
nonnormal chain. Let 

Poa, = Pu Pa, a, = -^ 16 , = p'i, and = g-/ 

as before, and use the same ^’s. But set 

Pa,o = Pa , 1 = \pi and Pft,o = Pb,i = \p\- 

It is clear that i^o!(o.i} ^^e the same as before, so that 

lim^_i- does not exist and Cq^ does not exist. On the other 
hand, the reverse chain is no longer of the same type and the argument 
for the nonexistence of G fails. In fact, 

OA, = lim [p<v + 2 p^% °Pa,i + 2 nv. 

= lim |^P<",> + 1 2 (Po% + ni,’)] 

_ 1 
” 2 

Hence Gq^ = J exists. Moreover, if x and y are any two states 
other than 0 and 1, 

X\ _ 1 XU 1 1 XU 

— 2 ^ 2 ly* 

Therefore all of G exists. The reverse chain is an example in which C 
exists but G does not. 

12. Two-dimensional symmetric random walk 

The purpose of this section is to show how the results of Section 8 
may be used to work out some of the matrices associated with the 
two-dimensional symmetric random walk. 

First we shall find the operator K. In this example, K = C = G and 

^(x,y)Xx',y') ~ ^(x-x',y-y'),(0,Oy 

Hence it suffices to compute one column of K. We let 

k(x, y) = 

A row of / — P is a charge whose potential is the corresponding row 
of /. For this process a row of / — P has only finitely many non-zero 
entries and is therefore a weak charge. By Theorem 9-84, (/ — P)K = 
— /. Thus k(x, y) is the average of its values at the four neighboring 
points, except that at the origin the average is one larger. We know 
also that k(0, 0) = 0. The high degree of symmetry of the chain 
implies that 



k{x, y) = k(-x,y) = k(y, x). 
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In particular, the values at the four points neighboring the origin must 
be the same. Since their average is one, i(0, 1) = 1. 

We shall need one more result. It can be shown (see Spitzer [1964]) 
that 

+ .+ 2 ^) forx>0. 

This identity, together with the properties above, determines the 
function k. 

In fact, it suffices to restrict attention to 0 < x < y. We know that 
0) = 0, A;(0, 1) = 1, and k{l, 1) = 4/tt. If we know the values of 
k(x, y) up to a given y^ for all x such that 0 < x < y, then we can find 
the values of k{x, y^ 1) for 0 < x < i/q 4- 1. The averaging and 
symmetry properties give 

*(0, yo + 1) = 4A;(0, y^) - 2k{\, y^) - k{0, y^ - 1) 

2/o + 1) = yo) ~ Vo) ~ Vo) ~ Vo ~ 1) 

for 0 < a: < i/o 

2/o + 1) = Vo) - HVo - Vo)- 

And k(yQ + 1, yo + 1) is given by the identity for k{x, x). 

The equations above thus are recursion equations for k{x, y). 
Actually these equations are so simple that we apparently have a very 
rapid method of computing for large finite sets E. Unfortunately 
the recursion is highly sensitive to rounding errors. 

In Table 9-1 we give values of k{x, y) for a wedge in the plane. The 
computations were carried out to 9-place accuracy, but hy y = 10 the 
effect of rounding errors was noticeable. Any larger table would 
require much more accuracy of computation. 



y 

y 

y 

y 

y 

y 

y 

y 

y 

y 



Table 9-1. k { x , y ) for a Wedge-Shaped Region 



= 9 2.429 2.431 2.444 2.461 2.486 2.514 2.546 2.579 2.614 2.649 

= 8 2.353 2.357 2.372 2.395 2.424 2.459 2.496 2.535 2.574 

= 7 2.267 2.274 2.293 2.322 2.359 2.400 2.444 2.489 

= 6 2.168 2.178 2.203 2.241 2.288 2.339 2.391 

= 5 2.052 2.065 2.101 2.153 2.213 2.276 

= 4 1.908 1.930 1.984 2.056 2.134 

= 3 1.721 1.762 1.849 1.952 

= 2 1.454 1.546 1.698 

= 1 1.000 1.273 

= 00 



X = Q X = \ x = 2x = ^x = 4:x = bx = ^x = l a: = 8a; = 9 
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If we wish to find k{E), A^, and for a finite set of points E, we first 
construct and then compute the inverse Let z = 

Since a = 1 ^ and since K is symmetric, the results in the statement and 
proof of Corollary 9-92 simplify to 

k{E) = 1/(1 ^2) 

Af = k(E)z^ 

= 1 + Ke~^ - k(E)zz^, 

Calculations for various sets E appear in Tables 9-2, 9-3, and 9-4. 



Table 9-2. k{E), A^, and P^ when E consists of Three Points on a 

Line; E = {(0, 0), (a, 0), (2a, 0)} 







a = 1 






a 


= 2 




k(E) 

A® 


= 0.785 
0.393 


0.215 


0.393 


k(E) 

A^ 


= 1.082 
0.372 


0.256 


0.372 


pE 


0.460 


0.393 


0.148 


pB 


0.610 


0.256 


0.134 




0.393 


0.215 


0.393 




0.256 


0.488 


0.256 




0.148 


0.393 


0.460 




0.134 


0.256 


0.610 






a = 3 






a 


= 4 




k(E) 

A® 


= 1.256 
0.365 


0.270 


0.365 


k(E) 

A^ 


= 1.379 
0.361 


0.277 


0.361 


pE 


0.663 


0.212 


0.125 


pE 


0.693 


0.189 


0.118 




0.212 


0.576 


0.212 




0.189 


0.621 


0.189 




0.125 


0.212 


0.663 




0.118 


0.189 


0.693 



Table 9-3. k(E), A^, and P^ when E Consists of Three Points on a 

Diagonal; E = {(0, 0), (a, a), (2a, 2a)} 





a = 1 




a 


= 3 




k(E) = 0.955 






k(E) = 1.407 






A® 0.375 


0.250 


0.375 


A® 0.360 


0.279 


0.360 


P ^ 0.558 


0.295 


0.147 


P ^ 0.699 


0.185 


0.117 


0.295 


0.411 


0.295 


0.185 


0.631 


0.185 


0.147 


0.295 


0.558 


0.117 


0.185 


0.699 



a = 5 



k{E) = 1.622 



A^ 


0.356 


0.287 


0.356 


pE 


0.738 


0.157 


0.106 




0.157 


0.687 


0.157 




0.106 


0.157 


0.738 
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Table 9-4. k(E) and for Three Sets E 



21 points arranged in an isosceles right triangle of base 6 

k(E) = 1.611 
A^ 

0.162 

0.046 0.059 

0.041 0 0.051 

0.041 0 0 0.051 

0.044 0 0 0 0.059 

0.109 0.044 0.041 0.041 0.046 0.162 


13 points arranged in a figure x 








k{E) = 


1.778 










A® 












0.153 










0.153 




0.064 






0.064 






0.030 


0.030 










0.015 








0.030 


0.030 








0.064 






0.064 




0.153 










0.153 


30 points in a 5-by-6 rectangle 








k(E) = 


1.670 










A® 












0.105 


0.042 


0.038 


0.038 


0.042 


0.105 


0.044 


0 


0 


0 


0 


0.044 


0.041 


0 


0 


0 


0 


0.041 


0.044 


0 


0 


0 


0 


0.044 


0.105 


0.042 


0.038 


0.038 


0.042 


0.105 



It is hard to acquire an intuition for the capacities of sets aside from 
their monotonicity. However, the values of and are quite 
intuitive. The latter, in this random walk, may be thought of as the 
entrance probabilities to E if the chain is started near oo. For example, 
in the case of the 5-by-6 rectangle in Table 9-4, it is clear that the 
corner positions should be considerably more probable than the points 
on the side. Points on the short sides are more probable than points 
on the long ones, and the rectangle cannot be entered at an interior 
point. Equally instructive are the values of A^ for three-point sets in 
Tables 9-2 and 9-3. The middle point is least likely to be hit first, 
but the difference decreases as the points are spread farther apart. 
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13. Problems 

1. Prove that 'N jj = for recurrent sums of independent random 
variables. 

2. Prove that for any null chain and for any states i and j there is a finite 
set E such that 

^(n) 

sup T^) < 

n lyiE 

3. Let a recurrent chain P be started in state 0. Let af be the mean number 
of times in state i in the time required to reach the set E and then return 
to 0. Thus, for example, if E is the set of all states, then = (l/cco)a. 

(a) Prove that for any set E there is a constant kg such that = k^a. 

(b) Let Q be the transition matrix for the transient states when 0 is 
made absorbing and let d be the restriction of a to the transient 
states. Show that if .S' is a set which does not contain 0, then 

1 - = -«[(/ - Q)B^n 

«0 

where is restricted to the transient states. 

(c) Conclude that if ^ is a set which does not contain 0, then Ijk^ is the 
transient capacity of the set E in the chain Q, provided the dis- 
tinguished superregular measure is taken as d. 

4. Prove that for a recurrent chain 

5. For the symmetric random walk in two dimensions, verify that the 
function whose {x, y)th entry is (|a;| + \y\ 4- 1)'^ is a potential, using 
only Corollary 9-16. Show that its charge / satisfies «/ = 0. 

6. Let P be the one-dimensional symmetric random walk, and let E = 
{0, 1, 2, 3}. 

(a) Find A^, P^. 

(b) Find all potential functions with support in E, and find their charges. 

(c) Compute af and ag for each. 

7. Let P be the symmetric random walk in two dimensions, and let a, 6, 

and c be three distinguished states. We play a game as follows: The 

process is started in 0. Each time it is in a or 6 we win a dollar, and each 

time it is in c we lose two dollars. 

(a) Let = Mq [expected gain to time n]. Prove that lim exists, 
and find a computable expression for it. 

(b) What happens if the game is changed so that we lose only one dollar 
when the process is in state c ? 

Problems 8 to 10 lead to the fact that in a normal chain the union of a small 
set and a finite set is small. 

8. Let P be a small set, and let P = P U {A:} have one more point. Show 
that 

Bf, = Bfi + for allied. 

From this equation show that if Af exists, then A^ exists and P is a small 
set. 
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9. In Problem 8 show that 

= 2 for all j 6^. 

meE 

Use the identity of Problem 8 to eliminate the factors Bf^, and solve 
the resulting identity for Bfjc. Prove from this result that Af exists, 
provided P is normal. 

10. Use the results of the two previous problems to prove that the union of 
a small set and a finite set is small in a normal chain. 

Problems 11 to 22 develop a new null example and use it to illustrate results 
in the chapter. The state space consists of all points z = (x, y) in the plane 
with integer coordinates >1. It will be convenient to let n = x + y. We 
let (1, 1) be our state 0. Define 



and 



11 . 

12 . 

13. 

14. 

15. 



16. 

17. 

18. 
19. 



20 . 

21 . 

22 . 



(ar,v),(x + l,v) 



X ^ y I 



(r.y).(x,y + 1) — 



a; + 2 / + 1 



(ar,y),0 



1 

a; -h y + 1 



Verify that P is null recurrent and that = 2/[(n — l)n]. 

Compute P. 

Prove that P^^^ is the same for all z with n — x y fixed. Do the same 
for 

Let E = [z\x-{-y< Find and A^. 




if a: < a;', y < y' 



otherwise. 



Let / be defined by /o = —1, /( 3 , 2 > = 10, and = 0 otherwise. Show 
that / is a charge, and use parts (1) and (3) of Theorem 9-15 to find the 
potential g. 



Check that f = {I — P)g for the functions of Problem 16. Does 
ag = 01 Verify that \^g = 0 for all finite sets containing the support. 



Show that Gq 2 = 0 and G^q = (n — l)n/2. 



Use Problem 18 and Proposition 9-45 to show that 



G 



zz' 



- 1 )^ _ 0 ^ 
{n' - l)n' 



Verify that the potential g found in Problem 16 is —Gf. 

Find C, and compute a potential measure of finite support in two different 
ways (in analogy with Problems 16 and 20). 

Let E be the triangular set of Problem 14, and let x be its characteristic 
function. Verify Proposition 9-43. 
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Problems 23 to 26 develop some theoretical results for an ergodic chain in 
terms of the operator K. 

23. Express K in terms of M and M. 

24. Prove that = (K^j — 

25. Show that 

26. What happens to the formulas in Problems 24 and 25 if the reference 
point 0 is changed ? 

Problems 27 to 32 carry this development further for finite recurrent chains. 

27. Show that k(S) = = -^ao* 

28. Prove that = A:(aS) 1 and aK = k{S)a. 

29. Prove that Ma^ = c1, where c = k(S) — 2i k{{i})ai. 

30. Prove that 

31. Prove that the set of charges is the same as the set of potentials. 

32. To what extent do the results of Problems 27 to 31 generalize to strong 
ergodic chains ? 

Problems 33 to 39 are intended to illustrate Problems 23 to 32 for the Land of 
Oz example, which was defined in Chapter 4. [See also Chapter 6, Problem 
1.] We choose the middle state (nice weather) to be the distinguished state 
0 . 

33. Show that P = P (the chain is reversible). 

34. Find if. 

35. Find K, using the result of Problem 23. 

36. Find k{S), using the result of Problem 27. 

37. Find K, using the result of Problem 30, and compare with the value of K 
found in Problem 35. 

38. Check the results given in Problems 24, 25, 28, and 29. 

39. Find the most general charge and the most general potential function. 
Verify that the set of charges is the same as the set of potentials. 

Problems 40 to 48 work out the probabilistic solution of the so-called Second 
Boundary Value Problem in the sense that Theorem 8-41 presented the 
solution of the First Boundary Value Problem. Let P be an absorbing 
chain whose transient states communicate and whose absorbing states form a 
finite set B. B is thought of as our “boundary.” To each state A; in B we 
associate a “neighboring” transient state k' . For a given function h, we 
define its normal derivative at A: to be — h^'. The problem is to find a 
function h which is regular on the transient states and has specified normal 
derivatives. 

40. Prove that hg = NRh^ for any solution. 

41. Modify the original chain so that instead of stopping at an absorbing 
state k, it moves to the neighboring k' with probability 1. Show that 
this new chain is recurrent. 
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42. Let P* be the transition matrix of the modified chain watched in B. 
Prove that this is an ergodic chain. 

43. Show that the requirement that h have the specified normal derivatives 
can be written as (7 — P*)^b = where d has the given values as 
components. 

44. Prove that a*d = 0 is a necessary condition for a solution to exist. 

45. Show that a*d = 0 is also sufficient by showing that d is a charge, that its 
potential will serve as hg, and that the h supplied by Problem 40 is a 
solution. 

46. Prove that the most general solution differs from the given one only by a 
constant. 

47. Prove that if the modified chain (indexed on all states) is a normal chain, 
then the most general solution is 




48. Show that if the transient states are the lattice points in a bounded 
convex set in n-dimensional Euclidean space and if the process moves as 
a symmetric random walk which is stopped when it moves out of the 
convex set, then we can apply the previous results. 

Problems 49 to 53 give a complete characterization of degenerate chains. 

49. Prove that if P is degenerate, so is P. 

50. Show that if P is degenerate and if we let i < j stand for = 1, then 
< is a simple ordering. 

51. Prove that k < i < j, then — 1. Deduce from this fact that, in 

moving to the right, the process can move at most one step at a time. 
[Hint: Consider for E = {k, j}.] 

52. Prove that the ordering of states must be that of the integers, the positive 
integers, or the negative integers. 

53. Show that the basic example and its reverse illustrate two of the possible 
orderings, and construct an example of the third. 
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1. Motivation for Martin boundary theory 

For purposes of motivation it is convenient to think of the state 
space of a Markov chain P with only transient states as being similar 
to the open unit disk of two-dimensional Euclidean space. In two- 
space the boundary of the disk — namely the circle — has the property 
that there is a one-one correspondence between the non-negative 
harmonic functions h{re^^) in the disk and the non-negative Borel 
measures (jl^ on the circle. The correspondence is 

h{re^^) = f (*) 

where P(re^^, t) is the Poisson kernel 

1 - r-2 

1 — 2r cos {8 — t) 

Transient boundary theory seeks an analogous representation theorem 
for all non-negative P-regular functions defined on the state space. 

The first problem that arises is to find what the analogs of the 
circle (the boundary) and the kernel should be. We would like a 
representation 

h{i) = \K{i, x)dfx^{x). 

In the case of the disk, a calculation with Green’s identities shows that 
any kernel P(re^^, t) giving rise to the correspondence (*) and satisfying 

— J P(rc^^ t)dt = 1 
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must be the normal derivative at t of the Green’s function for the disk 
relative to the point re*®. That is, 

P(re«,<) = 

Application of rHospital’s rule shows that 



lim 

z-*t 

radially 



G{z, re*®) 





t 



P(re^\ t)IP(p, t), 



where p is any fixed reference point in the disk. Hence, except for 
the positive factor P(p, t) which depends on t but not on re*®, the Poisson 
kernel is equal to 



lim 

z-*t 

radially 



G(z, re*®) 
G(z,p) 






Therefore this last function may be used in the representation (*) in 
place of the Poisson kernel; the distinction between the kernels is just a 
normalizing factor (depending on t) which can be absorbed by changing 
the measures. 

Two comments are in order. First, the limit in (**) need not be 
taken radially. Any method of approach of 2 : to as long as z stays in 
the interior of the disk, will give the same value. Second, the con- 
siderations above apply equally well to any domain in 9^-dimensional 
space with a sufficiently smooth boundary. Although the explicit 
form of the kernel will vary from region to region, it will always be 
connected to the Green’s function in the way we have just described. 

R. S. Martin [1941] made use of these observations to define an ideal 
boundary for an arbitrary domain in Euclidean space. If the Green’s 
function for the region is denoted G{z, y), he noted that points t on the 
ordinary topological boundary of the region did not necessarily have 
the property that 



lim 

z-*t 



G{z, y) 

G{z, p) 



exists. He suggested that distinct ideal boundary points u should be 
associated to subsequences [z^^ which yield distinct values for the limits 



lim 

Zn^t 



y) 

G'(Zn> V) 



K(y, u). 



He went on to show that the desired representation theorem is indeed 
obtained in terms of this boundary and the kernel K{y, u). 
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Doob [1959], taking advantage of the fact that the iV’-matrix for a 
transient Markov chain is the analog of the Green’s function (see 
Proposition 7-4), showed that Martin’s approach could be used to 
obtain a boundary for Markov chains. (We remark that corre- 
sponds to G(j, i) with the indices in the reversed order.) As the analog 
of Martin’s kernel he used limits on j of expressions of the form 

There is a minor restriction imposed by the Doob approach, namely 
that Nqj- is assumed non-zero for all j. For a more general chain in 
which it is not possible to get from state 0 to every other state, what 
Doob did was to consider only those states that could be reached from 
0. We shall not follow him in this respect. We simply use limits of 
Nij-l{7rN)j instead, where tt is a probability vector such that ttN is 
strictly positive. In terms of this kernel there is a natural space to 
try as the one corresponding to the closed unit disk. The space should 
consist of one point for each possible limit of Actually we 

shall find that this space is too large — that the space has to be cut down 
a bit for the representation to be unique. The price of uniqueness is 
that the cut-down space is not compact. 

The introduction of n in place of 0 itself leads to a problem. The 
representation will have to be restricted to 7r-integrable regular func- 
tions h, those for which irh is finite. This requirement evaporated in 
Martin’s or Doob’s treatment because for them tt assigned unit mass to 
a point 0 and A(0) was automatically finite. 

Hunt [1960] gave a new approach to Martin boundary theory for 
Markov chains which was more probabilistic in nature than Doob’s. 
We follow Hunt’s probabilistic approach, except that we use a different 
metric and get a boundary which is more like Doob’s. 

2. Extended chains 

We begin by introducing the machinery which we shall use in later 
sections to develop Martin boundary theory for Markov chains. We are 
going to use a broader notion of Markov chain than we have been 
considering so far — namely, a process whose time index starts at — oo 
and whose behavior is Markovian only after it has entered certain sets. 

That is, we extend the concept of Markov chain in two ways. First, 
we shall allow any finite measure tt as a starting distribution. This is 
only a minor modification in the theory and is a convenience in that it 
removes the necessity of normalization in certain constructions. 
Second, we shall extend the Markov chain to the past, that is, to a 
stochastic process where n runs through all the integers (including 
negative integers). This second extension is an essential one and 
will be the main topic of this section. 
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First we must extend the concept of a stochastic process. The state 
space will be a denumerable set S (with at least two elements) and two 
distinguished other states a and b. The underlying set Q of the 
measure space will consist of all doubly infinite sequences 

0> = (. . ., C_2j Cq, Cl, C 2 , • • •) 

such that 

(1) OT = a or c„ = b. 

(2) If = a, then = a for all m < n. 

(3) If = b, then = b for all m > n. 

(4) c^gS for at least one n. 

The interpretation of (2) and (3) is that state a stands for “not yet 
started” and state b stands for “stopped.” Thus (4) has the meaning 
that each path in Q represents some nontrivial possibility for the 
process. We shall refer to 12 as a double sequence space. 

We define the outcome functions as usual except that n may be 
any integer. A basic cylinder set is any truth set in 12 of a statement of 
the form 

~ ^ ^m + 1 ~ ^m + 1 A * * * A = C^, 

where at least one is an element of S. The field generated by the 
basic cylinder sets is denoted and the smallest Borel field containing 
J^is 

Definition 10-1: An extended stochastic process {x^ is the set of 
outcome functions for a measure space (12, Pr) such that 

(1) 12 is a double sequence space with state space S U [a, b], 

(2) ^ is the smallest Borel field containing the field of cylinder sets 
of ^2. 

(3) Pr[{o> \ Xn = i}] < CO for every integer n and every i in S. 

Note that we do not augment the measure space (12, Pr) by 
allowing all subsets of sets of measure zero to be measurable. 

We shall use interchangeably the notations Pr[P] and Pr[^], where P 
is the truth set of the statement Thus the third condition may be 
replaced by the condition Pr[a;^ = i] < 00 for all n and for all i in S. 
From it and from condition (4) in the definition of 12, we find that Pr 
must be sigma-finite. However, the measure Pr need not be finite. 

As promised. Definition 10-1 extends the definition of stochastic 
process in two ways: The time index n runs through all the integers, 
and the measure Pr need not be a probability measure or even a finite 
measure. 
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In examples it is necessary to have a method of constructing the 
measures for extended stochastic processes. If the measure Pr has 
already been defined consistently on basic cylinder sets, we need a 
version of the Kolmogorov Theorem to prove that Pr is completely 
additive on Theorem 1-19 then will give the extension of Pr to all 
of At this point, therefore, we stop to outline a proof that Pr is 
completely additive on 

Now the argument of Lemma 2-1 easily shows that Pr is non- 
negative and additive. Extend Q to include the set of all doubly 
infinite sequences of a’s, 6’s, and elements of aS, and define Pr to be zero 
on all cylinder subsets of the set added. Then if Pr were a finite 
measure, we could temporarily rearrange the time scale and then con- 
clude complete additivity by Theorem 2-4. But, in general, Pr is 
merely sigma-finite and therefore we shall write it as the countable sum 
of totally finite non-negative additive set functions, each of which is a 
measure on cylinder sets depending on a bounded time interval. 
Each of the summands is completely additive by the above argument, 
and therefore Pr is completely additive by Lemma 1-3. Thus all we 
need to do is decompose Pr as such a sum. The countable family of 
statements indexed hy i eS and by > 0, consisting of 

^0 = ^ 

= i A X_^ = b 
= a A x^ + ^ = i, 

is a disjoint exhaustive family in the original double sequence space, 
and each statement is assigned finite Pr-measure. For each of these 
statements q, define 

= Fr[E n {oj | q}] 

for E in the field of cylinder sets. Then the family {Pr^'^^} is the 
required family of set functions. 

We now fix our attention on a single extended stochastic process 

K}- 

Although we may be dealing with an infinite measure space, the 
conditional probability 

Pr[ 2 > I q] = Pr[^) A ?]/Pr[g] 

is still well defined as long as 0 < Pr[g'] < oo. We define Pr[p | q~\ to 
be zero if Pr[g] = 0. 

Definition 10-2: For ECS and for any co e Q such that x^{co) g E 
for some n, let U£(o>) be the infimum of all n such that Xj^{a)) e E and 
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let Vjc;(ca) be the supremum of all such n. Define u = and v = v^; 
u and V are called the initial time and the final time, respectively. 

By condition (4) of the definition of Q, we see that u(o>) and v(o>) are 
defined for all o>. Moreover, U£^(co) < V£(o>) whenever U£(cu) and 
V£;(co) are defined. The values U£(ca) = — oo and v^;(a>) = +oo are 
possible. If e E for some n, we have 

a \i n < u(o>) 

element oi S — E if u(o)) < n < U£;(co) 
element oi S — E if V£(a>) < n < v{o)) 
b if v(co) < n. 

Proposition 10-3: The functions u^ and v^; have a ^-measurable 
subset of Q as domain and are each ^-measurable. 

Proof: We prove the result for u^. We have 

k 

{oj I U£(co) < A;} = (J U I ^n{^) = »}> 

ieE n = - 00 

and the union of these sets on k is the domain of u^;. 

Definition 10-4: Let be a subset of S and define 

yn(o>) = ^(n + iz^(co))(^) 

for all n > 0 and for all co such that U£;(co) > — oo. Let D be the 
ordinary sequence space with state space 8 U {b} with the measure of 
each measurable set A C H defined to be 

I {yo{<o),yi{w),...)eA}]. 

The measure space Q and its outcome functions together are called the 

process watched starting in E. 

The process watched starting in a set E is an ordinary stochastic 
process, except that the starting distribution need not be a probability 
measure and can possibly be infinite. 

Let j e S, The mean number vj of times that the process {Xj^} is in j 
can, as usual, be computed by 

= 2 = i]- 

n 

except the sum is over all integers n. By Definition 10-1, each 
summand is finite, but the sum may be infinite. 
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Definition 10-5: An extended stochastic process {x^} is an extended 
chain with transition matrix P if the following conditions hold for each 
finite subset E of S: 

(1) The domain of has positive measure. 

(2) Pr[u£; = -oo] = 0. 

(3) The process watched starting in ^ is a Markov chain with 
transition matrix P and finite starting measure (but not neces- 
sarily a probability measure). 

(4) For all j eS, vj < oo. 

Note that the transition matrix P of an extended chain necessarily 
satisfies P1 = 1 . The state space of P never needs to be any bigger 
than S U {6}, but as we shall see shortly it must contain S. If 6 is in 
the state space, it clearly must be an absorbing state. 

From the definition of the process watched starting in P, we see that 
the total starting measure for the process is equal to the measure of the 
set of paths on which there is a first entry to E. That is, it is the 
measure of the set where > —oo. By conditions (1) and (2), this 
measure is positive. Hence the process watched starting in E has a 
starting measure which is not identically zero. 

Applying this observation to the one-point set {j}, we see that j 
must be included in the state space of P. 

Let {x^} be an extended stochastic process satisfying (1), (2), and 
(3). We shall derive as Proposition 10-6 a necessary and sufficient 
condition for (4) to hold. Let P be a finite set of S, We introduce the 
abbreviations 

nE,m ^ Pr[u^; = m ^ Xjn = i] 

and 

1.^ = 2 
m 

Then is the starting measure of the process watched starting in P 
and is a finite measure with support in P by (3). Our remarks above 
showed that is not identically zero. 

Since the process watched starting in P is a Markov chain and since 
(2) holds, the following computation is justified: For j e P, 

Pr[x„ = j A = fc A + 2 = s] = 2 

m.i 

where = 0 if n < m. A similar computation of Pr[p] is 

possible for any p such that p is false if P is never entered and p 
depends only on outcomes after P is entered. 

As an application of this calculation, we can relate condition (4) to 
properties of P. 
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Proposition 10-6: Let {x^} be an extended stochastic process satisfying 
conditions (1), (2), and (3) for an extended chain. Then for j g E, 

V, = 

Moreover, the process is an extended chain if and only if P is the 
transition matrix of a transient chain whose transient states are S 
and which may have b as an absorbing state. 

Proof: We have 

V, = = J] = 2 = 2 

n n,m,i m,i i 

or 

V, = 

Taking E = {j}, we find 

Now jjif is assumed finite by (3) and it is strictly positive by (1). Hence 
vj is finite for all j e ^ if and only if all elements of S are transient for a 
Markov chain with transition matrix P. Furthermore, a cannot occur 
as a state, and if b occurs, it must be an absorbing state. 

Corollary 10-7 : Let {x^} be an extended stochastic process satisfying 
conditions (1), (2), and (3) for an extended chain. Then vj > 0 for 
allj. 

Proof: We have vj = and each factor on the right side is 

positive. 

An important example, but by no means the most general example, 
of an extended chain is obtained as follows. Let P be a Markov chain 
with all states transient and let 77 be a starting distribution such that 
ttN is strictly positive. (For instance, let tt assign weight 2"^ to the 
nth state.) Let P be the enlarged chain obtained by adding the 
absorbing state b. Form an extended stochastic process by defining a 
measure on cylinder sets as follows: Every basic statement containing 
the assertion x^^ = i for n < 0 and i ^ a ov the assertion = a for 
m > 0 gets probability zero. The statement 

x_^ = a^’— ^x_^ = a^XQ = i^x^^=j^X2 = k 
^ ^ x^_^ = r ^ x^ = s 

gets probability . . . P„. The probabilities of all other basic 

cylinder statements can be obtained from these by adding the prob- 
abilities of a suitable number of statements of the form just described. 
The claim is that this extended stochastic process {x^} is an extended 
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chain with transition matrix P. Property (1) follows from the fact 
that ttN >0. Property (2) is an immediate consequence of the 
definition of the process. Property (3) follows from Theorem 4-9; 
the starting distribution for the process watched starting in E is 
Finally Property (4) comes from Proposition 10-6; alternatively we 
could compute directly that the mean number of times in state j is 
{TrN)j We shall call this process the extended chain associated with tt 
and P. 

If is an extended chain and P is a finite set in S, we define 
. We know that the process watched starting in P is a 
Markov chain with starting distribution /x^. Hence i/f is the mean 
number of times in j in this Markov chain. That is, it is the mean 
number of times in j after entering E in the extended chain. From this 
interpretation we see that is monotone increasing in P and that 
= y. for j G P. In order to get an interpretation of and fx^ in 
terms of v, we shall generalize the notion of balayage potential as 
defined in Chapter 8. 

If P is a Markov chain with all states transient, if A is a non-negative 
finite-valued superregular function, and if P is a finite set of states, we 
define the balayage potential of /i- on P to be the function B^h. By 
Lemma 8-22, B^h is a pure potential with support in P. Now B^h is 
the unique pure potential with support in P which agrees with h on P, 
since if h is another such potential, h must be the unique balayage 
potential of B^h on P (Theorem 8-46) and hence must equal B^h. 
Moreover, if P^ is an increasing sequence of finite sets with union the 
set of all states, then the charges of B^nh, namely (/ — P){B^nh), 
converge to (/ — P)h, since 

lim PB^nh = Ph 

n 

by part (4) of Proposition 8-16 and by monotone convergence. 

Let us dualize these results. Let y be a non-negative finite- valued 
superregular measure, and let P be a finite set. Then there is a unique 
pure potential measure with support in P which agrees with y on P. 
This potential is defined to be the balayage potential of y on P. It 
has the property that if P^ is an increasing sequence of sets with union 
the set of all states, then the balayage charges of y on P^ converge to 
y(7 — P). By Proposition 6-16, the balayage potential of y on P is 

(Ve YEU(^N)g), 

where 



and (^N)g = 2Q\ 



P = 



T U' 
R Q, 
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If we let be the balayage potential oiy on E and if is the charge 
of then E C F implies To see this equality, we note 

that and are both pure charges with support in E and that the 
potential of p^B^, when restricted to E, is 

(p^B^N), = (p^N - p^ ^N), = (p^N), = y, = (P^N), 

by Lemma 8-17. Hence the potentials of p^ and p^B^ agree on E, 
and they must therefore agree everywhere. Thus 

pE ^ 

by Theorem 8-4. 

Our characterization of and /x^ in terms of balayage potentials is 
the content of the next proposition. 

Proposition 10-8: For every extended chain with transition matrix P, 

(1) V is a superregular measure for Pg. 

(2) v(I — Pg) = /X, where /x = lim^ (x^ as E increases to S. 

(3) and are the balayage potential and charge, respectively, for 
V on E. 

Proof: We have 

r^Ep^ = /x^iV^Ps = fx^(N - I) = 

Along any increasing sequence of sets E^^ with union S, increases to 
V, Hence by monotone convergence 

vPg = V — lim , 

n 

This equality implies that 

vPg = V — lim (x^ 

E^S 

and proves (1) and (2). To prove (3) we need only remark that is a 
pure potential with support in E which agrees with v on E. Hence it 
must be the balayage potential. 

Proposition 10-8 has as a converse the following theorem, which 
asserts roughly that any superregular measure can be represented as 
the vector of mean times in the states of some extended chain. This 
result will not be used until Section 11, and its proof, which is quite 
long, will not be given until after that section. 

Theorem 10-9: Let P* be the transition matrix of a transient chain 
with P*1 = 1 and with at most one absorbing state 6, and let be a 
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non-negative finite-valued measure defined on the transient states and 
superregular for the restriction of P* to the transient states. Let S 
be the support of v, and suppose 8 has at least two elements. Then 
there is an extended chain [x^ having Ps^jib) transition matrix and 
having v as its vector of mean times in the states of 8. Furthermore, if 

Vs = /xiY -f p 

is the unique decomposition of vs with p regular for P|, then p,N is 
contributed by the paths with u{w) > — oo and p is contributed by 
the paths with u(w) = — oo. 

To conclude this section we shall define what we mean by the reverse 
of an extended stochastic process, and we shall prove that the reverse of 
an extended chain is an extended chain. The transition matrix of the 
reverse, at least when restricted to 8, will turn out to be the v-dual of 
the transition matrix, restricted to 8, of the original process. We 
need the following lemma, whose proof uses the calculation preceding 
Proposition 10-6. 

Lemma 10-10: If {x^} is an extended chain with transition matrix P 
and if is in a finite set E in 8, then 

Pr[x„£-2 = » A = j A x„^ = k] = 

where e® is the escape vector for P. 

Proof: 

= » A X„^_l = j A x„^ = k] 

= 2 i A X„ + 1 = j A X„ + 2 = k 

n 

A not in E after time (n -f- 2)] 

= 2 by the Calculation 

m,s,n 

= 2 = i]PijPik4 by the calculation again 

n 

= ViP ijP jj^e^. 

For any point in we define o)' to be the point in Q with 

(x_r,(oj) if X_^((xj) e8 
la if x_n(oj) = b 

b if x_r^{o)) = a. 



) ~ 




334 Transient boundary theory 

Definition 10-11: Let {x^ be an extended chain defined on Q with 
measure /x. Set 

Pr'[co G = Pr[o>' G A] 

for all sets A for which the right side is defined. The extended sto- 
chastic process on Q defined by the measure Pr' is called the reverse of 
the extended chain. 

Proposition 10-12: If {x^} is an extended chain with transition matrix 
P, then its reverse is also an extended chain and its transition matrix P 
satisfies 

= ViPijlvj 

for all states i and j in S. 

Proof: From the definition of o>', we see that 

= -Ve(o>'). 

Hence 

Pr'[oj G domain u^;] = Pr[o> g domain v^;] = Pr[w g domain u^] > 0 

and (1) holds. Since the chain watched starting in E is in the finite set 
E infinitely often with probability 0 (second half of Proposition 10-6), 

0 = Pr[vj5; = -f-oo] = Pr'[u£ = -oo]. 

Thus (2) holds. 

Next, we show that the reverse process watched starting in ^ is a 
Markov chain with transition matrix P. We shall compute only a 
typical conditional probability: Let keE and first suppose i ^ b. 
Since 

^^\P^Ue = ^ a = i A 

= ^A^ve-2 = ^ A = j A = k] 

— jk^k 

by Lemma 10-10, we have, provided the condition has positive Pr'- 
measure, 

[P^ue+ 2 — i I ^ue = ^ a ^i 2£ + 1 = i] = i^iPijPjk^k)K^j^ jk^k) 




Next we compute the typical probability 

^^'[^UE = ^ ^ ^UE+1 = j ^ ^Ue-^ 2 = b]. 
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Since we know h is absorbing, we may assume j b. Then this 
probability is 

Pr'[x^^ = k A + i = j] - Pr'[Xfi^ = * A + i = j A 

= ^jPjk^k - 2 '’iPiiPjk^k- 

ieS 

Hence if ^ A + 1 = j] > 0, then 

Pr'[Xfl^+2 = ^ i = * A + i = j] = (vj - 2 

ieS 

(Notice this probability is non-negative because v is P^-superregular.) 
Therefore the reverse process watched starting in P is a Markov chain. 

The total starting measure is finite for the reverse process watched 
starting in E because 

2 ppiP^UE = ^] = 2 

i i 

= 2 by Lemma 10-10 

i 

< 00 

by (4) for the given process and by the finiteness of the set E. Hence 
(3) holds. 

Finally we prove (4) for the reverse process. The same argument 
as in Proposition 6-12 shows that 

^ a = ViNijIvj. 

Hence P is transient and (4) holds by Proposition 10-6. 



3. Martin exit boundary 

We now define the Martin exit boundary of a transient chain with 
respect to a given starting distribution. With this boundary we shall 
be able to describe the long-range behavior of the process and we shall 
obtain a Poisson integral-type representation for all finite-valued non- 
negative superregular functions which are integrable with respect to 
the starting distribution. 

Let P be a Markov chain with all states transient and let tt be a 
starting vector (77 > 0 and ttI = 1). Throughout our discussion P 
and 7T will be fixed. The vector ttN is non-negative, finite-valued, and 
superregular. Ordinarily we shall assume that it has been chosen so 
that ttN is strictly positive; that is, so that there is positive probability 
of reaching any state eventually. But for technical reasons which will 
arise when we consider ^.-processes, it will be convenient to adopt 
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conventions about what to do when ttN has some zero entries. These 
conventions we shall discuss at the end of this section, and except when 
stated otherwise we shall assume that ttN is everywhere positive. Let 
S be the state space of P. 

Since NJ^j = (TrN)^ has been assumed positive, we may define 

The notation will mean considered as a function of the 

first variable with fixed. Then for each j, is a non-negative 

finite-valued superregular function with 7rK(-,j) = 1. It is regular 
everywhere except at j, where it is strictly superregular. 

For fixed i the function K{i, • ) is bounded, since 



Kd i) = ^ = JL ^ - 



N 



jii 



where carets denote duality with respect to ttN, 

The real-valued functions defined on aS x aS by 

= \K{i,j) - K{i,j')\ 

have the property that is a Cauchy sequence for all i in aS if 

and only if lim,„ 3n) ~ ^ According to the bound we 

just computed for we may lump the functions into the single 

finite-valued function d defined by 

= 2 3 ) - 

ieS 

where the Wi are positive weights such that 2 is finite. 

We shall show that d is a metric for S. Clearly d satisfies all the 
conditions of a metric except possibly that d{j,j') = 0 implies J = j'. 
But if d{j,j') = 0, then, since > 0 for all i, we must have 






Multiplying through by P and supposing that j ^ j', we obtain 

i i 



the strict inequality holding, since is not regular at j'. We 

conclude that j = j' and that d is a metric. 



Proposition 10-13: A sequence {j^} in the metric space (S,d) is 
Cauchy if and only if the sequence of real numbers {K{i,j^)} is Cauchy 
for every i. 
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Proof: If is Cauchy, then certainly [w^N j^)] is Cauchy. 
Since > 0, {K(i,j^)] is Cauchy. 

Conversely, let {K{i,j^)] be Cauchy for all i, and let e > 0 be given. 
Choose a finite set of states E such that 

keS-E 

and choose M sufficiently large that 

\K(i,jm) - K{i,jn)\ < „ 

for i E E and for all n,m > M. Then d(jjn, jn) < 

We define S* to be the Cauchy completion of the metric space (aS, d), 
and we let jB = >S* — aSi. The set B is the Martin exit boundary for the 
chain P started with distribution tt. 

The set B is not necessarily a boundary in the topological sense, 
since there are examples in which it is not a closed set in 8*, but the 
abuse of notation will not disturb us. 

From Proposition 10-13 we see that K{i, • ) is a uniformly continuous 
function on S. Hence it extends uniquely to a continuous function 
on aS*. We shall use the same notation K(i, • ) for the function on aS*, 
but will normally denote points of B = 8"^ — S hj x or y. 

The characterization of Cauchy sequences given in Proposition 10-13 
shows that the nature of the space 8* does not depend upon the choice 
of the weights Wi. That is, the Cauchy completions of 8 corresponding 
to two different choices of weights are homeomorphic. 

Since K{i, •) is continuous on 8*, it follows that the extension of d 
to aS* is simply 

d{x,y) = 2 - K{hy)\- 

ieS 

A repetition of the argument in Proposition 10-13 then shows that 
{Xj^} is Cauchy in 8* if and only if {K{i, x^)} is Cauchy for each i. 
Applying this result to the sequence whose terms are alternately x and 
then y, we find that x = y if and only if K[i, x) = K{i, y) for all i. 
We state this conclusion as a proposition. 

Proposition 10-14: K{i, x) = K(i, y) for all i if and only if x = y. 

Proposition 10-15: The space 8* is compact. 

Proof: Since 8* is a metric space, it is enough to prove that any 
sequence {x^} has a convergent subsequence. Now 

K(i,Xn) < sup K(iJ) < < oo. 
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Thus by a diagonal process we may choose a subsequence } such that 
m, IS Cauchy for all %. T?hen is Cauchy in Since 
is complete, [x^^] is convergent. 

The sets in the smallest Borel field containing the open sets of S* are 
called Borel sets. The boundary B is a Borel set, since it is the comple- 
ment of a countable set. Finite measures defined on the Borel sets are 
called Borel measures. 

We conclude this section by agreeing on what conventions we shall 
adopt in case ttN has some zero entries. If S is the set of all states, let 

W = {ieS\ (7rN)i > 0}. 

The special nature of W implies that = P^, and by Lemma 8-18 
we see that the fundamental matrix for P^ is Thus if we agree 

that boundary theory for P and tt is to be interpreted as boundary 
theory for P^ and we find that for i and j in W 

and K(i,j) is not defined otherwise. Hence the metric is 

ieW 

for j andy in W. Boundary theory is then done relative to the Cauchy 
completion of (IF, c?). 



4. Convergence to the boundary 

We continue to assume that P is a Markov chain with all states 
transient and that tt is a starting distribution with ttN > 0. Let P 
denote the enlarged chain obtained from P by adding the absorbing 
state b. 

The main theorem of this section will be that with probability one 
every path o) has the property that either x^{oj) converges in aS* or the 
process along co disappears in finite time. From the results of the next 
sections we shall be able to sharpen the theorem by concluding that, 
when convergence takes place, it a.e. is to a nice subset of the boundary. 

Lemma 10-16: Let gr be a pure potential for P with charge /. If 
{x^ is an extended chain of total measure one with transition matrix P 
for which i/is finite, then the limit of g{Xj^) as n decreases to u exists a.e. 
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Proof: Form the process watched starting in the finite set E. The 
claim is that for n > 0, is a non-negative supermartingale. 

It satisfies the supermartingale inequality because g is non-negative 
superregular. Thus, to show that the means are finite, it is sufficient 
to consider M[g(a;^^)]. We have 

= ix^g = ^i^Nf = v^f < vf < co, 

since / is non-negative and < v. Hence +n)} is a non-negative 

supermartingale . 

If 0 < r < 5, then Proposition 3-11 applied to —g (or Proposition 
8-79) shows that the mean number of downcrossings of [r, s] by 
[g{Xu^ +;)} fo time n is bounded by sj{s — r) independently of n and 
of E. Let n -> 00 and then let E increase so that approaches u. 
The mean number of downcrossings of [r, s] remains bounded, and by 
monotone convergence the mean number of downcrossings after time 
u is finite. By the argument in the proof of Theorem 3-12, g(x^) 
converges a.e. as n decreases to u. It can be shown that the limit is 
finite a.e., but this fact will not be needed. 

Lemma 10-17 : Let ^ be a Borel field of subsets of a set £2, let S* be 
a compact metric space, and for each n let /^: be a function 

with the property that f^~^(E) is in ^ for all Borel sets E. If /„(o^) 
f(o)) for all ct>, then/“^(i7) is in ^ for all Borel sets E. 

Proof: First consider the case of a compact set C. Let N^{C) be the 
open set of all points at a distance less than e from C. Then 

f-HC) = n u n L-HN,„(C)). 

k=l n=l m=n 

Let ^ be the class of all Borel sets E for which f~^{E) e Then ^ is 
clearly closed under countable unions and complements. Since ^ 
contains all compact sets, it contains all Borel sets. 

Theorem 10-18: Let the chain P with all states transient be started 
with a distribution tt such that ttN > 0. For each path let v(o;) be 
the supremum of the n such that is in S. Then a.e. either v(co) < 
-foo and x^^^>^{oj) e S or v(w) = -j-oo and Xj^((jd) converges to a point 
as n tends to infinity. Furthermore, if ^ is the least 
Borel field containing the cylinder sets for P, then the set of oj where 

is defined is in and the inverse image under of any Borel set in 
8* is a set of 
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Proof: First we prove the measurability. The set where is defined 
is the countable union of the sets {v = 1}, {v = 2}, . . . and the set 
{v = 00 A x^{o}) converges}. All of these except possibly the last are 
certainly in The last set is 

A = {(jj\ v(o)) = 00 A lim sup K{i, (o>)) 

n 

= lim inf K(i, o:^(a>)) for all i] 

n 

and is therefore in Now the intersection of {v = n] with the inverse 

image under x^ of a Borel set E is certainly in Therefore to com- 
plete the proof of the measurability part of the theorem it is sufficient 
to prove that the intersection of the set A defined above with x^~'^(E) 
is in ^ for every Borel set E. In Lemma 10-17 let Q be the set A 
and let the field be the class of sets A G, where G e Since 
A n Xj^~'^{E) is in the field for all E, the lemma applies and gives the 
result immediately. 

Next we are to prove the almost-everywhere statement. Form the 
extended chain associated with tt and P, as described in Section 2 after 
Corollary 10-7. All statements about this extended chain after time 
71 > 0 have the same probabilities as the corresponding statements 
about P, and the vector v of mean times in the various states is ttN. 
It is therefore sufficient to show in the extended chain the convergence 
of K{i, x^) for all i. 

Let [Xj^ be the reverse of this extended chain. Since 

K{i,j) = 

it suffices to show for each i that in the reverse process converges 
a.e. as n decreases to u. But is the P-potential of a unit charge at i. 
Since the charge has finite support, v times it must be finite. Therefore 
the theorem follows by applying Lemma 10-16 to the potential 
for the reverse process {^^}. 

By Theorem 10-18 the statement that x^ exists (or equivalently that 
x^ 6 aS*) and the statement that x^ e E, where P is a Borel set in P*, 
are both measurable with respect to the least Borel field containing 
all cylinder sets. But Pr^ is defined on all such statements. Hence 

'Prlx^ e E] 

is defined if P is a Borel set in S*. 

From now on, we use the notation of Chapter 2 that is the field of 
cylinder sets for P and ^ is the least Borel field containing 
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Corollary 10-19: Pri[a;^G/S*] = 1. 

Proof: As in the proof of Theorem 10-18, form the extended chain 
associated with P and tt. By Theorem 10-18 almost every path in the 
process P (started according to tt) satisfies x^eS"^. Hence the same is 
true of the extended chain, and hence it is true of those paths in the 
extended chain which pass through i. On any path o> which passes 
through i, x^{(jj) e if and only if e 8"^ . Therefore by 

Definition 10-4, the extended chain watched starting in {i} satisfies 

Pr,(i>[:r,G^*] = 

On the other hand, 

PT^ii}[x^eS*] = Prf[a;^ GaS*]. 

Since /xP # 0, we must have PTi[x^ e aS*] = 1. 

It is to be emphasized that 8* has been constructed for the fixed 
starting distribution tt and that Corollary 10-19 is not the same as 
Theorem 10-18 restated for the case where tt assigns measure one to 
the state i: The boundaries for different starting distributions may be 
different. 

5. Poisson-Martin Representation Theorem 

The notation P, tt, K{i, x), 8*, 3^ , and ^ of Sections 3 and 4 is still 
in force. We shall use Pr to mean Pr^. 

We recall that 7rA(-,j) = 1 for all j in 8. li in a8*, then 

' 5 jn) ,x) and, for all n, ttK{ • , = 1. Hence ttK{ • , x) < I 

by Patou’s Theorem. Moreover, we know that K{-,j) is P-super- 
regular for all j in 8. If -> x, then again by Patou’s Theorem, 

PK(‘,x) = PlimK(-,j^) < liminf PA(., jj 
< liminf A(-, = K[-,x). 

That is, K{' ,x) is P-superregular for all x in aS*. These remarks enable 
us to prove the following proposition. 

Proposition 10-20: If v is any Borel measure on a 8* with v{8^) = 1, 
then the function h, defined by 

h^ = K{i, x)dv{x), 

Js* 

is finite-valued non-negative superregular and satisfies Trh < 1. 
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Proof: It is clearly non-negative and is finite- valued because 
K(i, • ) is bounded. By Fubini’s Theorem, 

= 2 £. TT^K(i, x)dv(x) = l.[? a:)jdv(a;) < j dv{x) = 1 

and 

(-PA)i = 2 £. ^)dv(x) = 1^2 

< K(i, x)dv{x) = 

Js* 

Thus Borel measures on give rise to 7 r-integrable non-negative 
superregular functions h. Our goal in this section will be to prove 
conversely that every non-negative (finite-valued) superregular func- 
tion h arises as the integral over /S* of K{i, x) with respect to some 
measure. We postpone the uniqueness question to Sections 6 and 7. 

Throughout the remainder of this chapter we shall use ‘‘superregular ” 
to mean “finite-valued superregular.” 

Harmonic measure /x is defined on the Borel sets ^ of aS* by 

lx(E) = Vt[x^ e E], 

By Theorem 10-18 the definition of /x makes sense and /x(a 8*) = 1. 
The complete additivity of /x is a consequence of the complete additivity 
of Pr. Thus /X is a Borel measure. The proposition to follow gives a 
formula for Pr^[a::^ G E] in terms of harmonic measure. 

Proposition 10-21 : For every Borel set of 

Pr^[a:^, e E] = { K{i, x)d^{x). 

Je 

Proof: Let E^ be a fixed increasing sequence of finite sets of S with 
union S. Let v,^(o;) be the last time (possible -f-oo) that an outcome on 
the path co is in E^, and let v,^(a>) = 0 if no outcome on o> is in E^. 
For any starting distribution y, Proposition 4-28 implies that 

00 

= j] = 2 = i ^ after time k] 

k = 0 

= 2 

k = 0 

= (YN)jefn. 

Hence 

= j] = N„efn = K(i,j)N,,ef^ = K{i,j)¥r,[x,^ = j]. 
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For Borel sets E of /S* define measures by 

= Pr,[ir„^ 6 E], 
fii(E) = Prj[x„ e E], 

= Pr„[a;„„ G E^ 

and 

= Pr»[*v ^E] = fi{E). 

What we have just shown is that 

(J-tniE) = I K{i, x)dlM„^(x). 

J E 

Now if / > 0 is a Borel measurable function on aS*, the claim is that 

f = f (w))<^Pr((oj). 

Js* Jq 

The result for characteristic functions is just the definition of /x^^, and 
for general / > 0 it follows from the result for simple functions by 
monotone convergence. Then the result holds for continuous / > 0 
and hence for all continuous /. Similarly for continuous /, 

f fi^WAx) = f /(a:„(a>)) rfPr,(o>), 

Js* Jo 

where we set x^{o)) = 0 when v is not defined. 

As ri -> 00 , -> x^{oj) a.e. [PrJ by Corollary 10-19. When / is 

continuous, f(x^^(ctj)) f(x^(oj)) a.e. [PrJ. Since continuous functions 

are bounded, we have 

lim I* f(x^Jco))dPrA(o) = f /(x„(co)) dPr,(w) 

n Jq Jq 

by dominated convergence. Hence 

lim f(x)dfii^(x) = f{x)dfiAx) 
n Js* Js* 

for all continuous /. Similarly 

lim r f(x)dfj,„^{x) = f f(x)dfi„(x). 

n Js* Js* 

Since K(i, •) is continuous, so is f{^)K{i, •). Therefore 

lim f f{x)K(i,x)dfi„n{x) = f f(x)K(i, x)dn„(x) 
n Js* Js* 

for all continuous/. Since jJiin(E) = K(i, x)dfjLj^n(x), we obtain 

f = f f{x)K(i, x)dfi„(x) 

Js* Js* 
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for all continuous /. Therefore 

Mf(^) = (* K(i, x)dn„(x). 

Je 

Corollary 10-22: K{i, x)dy.(x) = 1 for all i eS. 

Proof: Since Prf[x^ gS*] = 1 by Corollary 10-19, the result follows 
from Proposition 10-21. 

The corollary we have just proved is the representation theorem as it 
applies to the column vector which is identically one. We shall be 
able to get the general case by applying the corollary to a suitable 
modification of the A-process introduced in Chapter 8. We now 
re-define the ^-process in such a way that we allow h to have some 
entries equal to zero. 

Definition 10-23 ; If A > 0 is a finite-valued P-superregular function 
such that ttA = 1 , then A-process is defined to be the Markov chain with 
state space S and with the measure Pr'^ defined by 

Vv\xq = c^x^ = d a X2 = e/\••'^ = i a x^ = j] 

= '^c^cd^de • • • 

We readily check that the /^-process is indeed a Markov chain. If 
we define 8^ by 

8^ {ie8\h,> 0}, 

then the transition matrix of the A-process satisfies 

for i and j in 8^, 

Pf, = Mi 

[o for i G 8^ and j g 8 — 8^ 
and the starting vector satisfies 

nf = TTihi for all i. 

If t is in iS — 8^, then Pfy is not defined and we shall agree to take it to 
be zero. With this definition we compute directly that the funda- 
mental matrix satisfies 



N^. = 

ij 



K 



if i and j are in 8^ 



otherwise. 



Hence P^ has only transient states. 




10-25 



Poisson-Martin Representation Theorem 



345 



Lemma 10-24: If A > 0 is a P-superregular function and if = 0 

and hj > 0, then = 0. If, in addition, ttA = 1, then = 

hj{TrN)j, and > 0 if and only if j is in 8^. For i and j in 8^, 

K\i,j) = K{i,j)lhi. 

Proof: If = 0 and hj > 0, then for every n 

k 

and hence = 0. Therefore Nij = 2n Consequently, 

= 2 = M h 1 

ieSh \ / ieS^ teS 

By assumption tt is a vector such that ttN is strictly positive. Therefore 
= 0 if and only if hj = 0. 

Finally, according to the convention at the end of Section 3 and the 
calculation just completed, K^{i,j) is defined if i and j are in 8^. We 
have 

K\i,j) = N%li-r^N% = = K(i,j)IK 

Since the A-process has the property that {7T^N^)j is positive exactly 
when j is in 8^, we can, as noted at the end of Section 3, define a metric 
(P on 8^ X 8^ and we can form the Cauchy completion 8^* with 
boundary We shall agree to use the same weights in defining 
that were used in defining d. 

Lemma 10-25: The identity map from (8^,d^) into (8,d) is an 
isometry. 

Proof: Let j and j' be in 8 ^. Then 

ieS^ 

= 2 - ^(^>/)| 
i€S^ 

= 2 w^N„^\K(i,j) - K(i,j')\ 

ieS 
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It follows from Lemma 10-25 that 8 ^* can be canonically identified 
with a compact subset of aS*. Thus by continuity, K^(i, x) = K(i, x)lhi 
for all i in and x in 8 ^*, Harmonic measure for the A-process will 
be denoted We can consider it to be defined on 8 * (as well as on 
8 ^*) if we set 

fji^(E) = fi^(E n 8^*) 

for Borel sets E in 8 *, 

We are finally in a position to state and prove the existence half of 
the Markov chain analog of the Poisson-Martin Representation 
Theorem. 

Theorem 10-26: If A > 0 is a finite-valued P-superregular function 
such that ttA = 1, then 

hi = I K(i, x)dfi^(x). 

Js* 

Proof: Applying Corollary 10-22 to the A-process, we have 

r K^(i, x)dfji^(x) = 1 

for i in 8^. That is, for i in 8^ 

hi = r K(i, x)dfji^(x) — f K(i, x)djjL^(x). 

Js^* Js* 

Now if i s8 — 8^, Nij = 0 for all j e8^ by Lemma 10-24. Thus 

K{i,j) = 0 for such i and j. Since K(i, x) is continuous on 8^*, 

K(i, a;) = 0 for i g8 — and x eS^*. Therefore for such i, 

hi = 0 = r K(i, x)d^^(x) = f K(i, x)dfji^(x). 

Js^* Js* 

Of course, the representation theorem immediately extends to cover 
all P-superregular functions A > 0 for which ttA is positive and finite. 
However, the probabilistic interpretation of the measure is lost. 
We shall return to this point in Theorem 10-41 of Section 7 after 
proving the uniqueness theorem. 

6. Extreme points of the boundary 

The measure is not necessarily the unique Borel measure which 
represents A in the sense of Theorem 10-26, and we consequently need 
another hypothesis to get uniqueness. What we shall do in this section 
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is to define the set of extreme points of the boundary and the subset 
5 = aS U of aS*. In Section 7 we shall see that has all its mass on 
S and that is the unique measure with all its mass on S for which the 
representation in Theorem 10-26 holds. 

There are three kinds of behavior of points of the boundary that we 
shall want to exclude: 

(1) X has the property that ttK{- , :r) < 1. 

(2) X has the property that if ( • , is not regular. 

(3) X has the property that K{^,x) is regular but is a nontrivial 
convex combination of other non-negative regular functions. 

The first two of these possibilities are the topic of the two lemmas to 
follow. The third possibility will require more of our attention, and 
we discuss it beginning with Definition 10-29 and Lemma 10-30. 

We recall that ttK( x) < I for all x in S*. 

Lemma 10-27 : For almost every x [/x] in aS*, ttK( • ,x) = 1. The set 
where the equality holds is a Borel set. 

Proof: For each i, the function K(i, x) is continuous. Hence the 
countable sum ttK( • , x) is Borel measurable. Therefore the set where 
it equals one is a Borel set. 

By Corollary 10-22, 

K{i, x)dix(x) = 1. 

Js* 

Thus by Fubini’s Theorem, 

1 = 7 t 1 = rr I K( - , x)dfx(x) = 7tK{- , x)dix(x). 

Js* Js* 

But ldfx(x) = 1 also, and since 1 — ttK(- , x) > 0, we conclude that 
ttK{' , x) = 1 a.e. by Corollary 1-40. 

We say that a function h is normalized iiirh — 1. By Lemma 10-27, 
• , a:) is normalized for a.e. x [/x]. 

We recall that K[- ,x) is P-superregular for all x g aS*. 

Lemma 10-28: For almost every x [fx] in the boundary B of aS*, the 
function K(‘ , x) is regular. The set where it is regular is a Borel set. 

Proof: The set where , x) = K{i, x) is a Borel set since it is 

the set where a Borel measurable function takes on the value K{i, x). 
The set where K(- ,x) is regular is the countable intersection of these 
sets. 
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By Theorem 4-10 with the random time identically equal to one, we 
see that the column vector whose ith component is (see Proposition 
10 - 21 ) 

J K(i, x)d^x(x) = Vvi[x^ G 5] 

is P-regular. By this observation and by Fubini’s Theorem, we have 

f PK{ - , x)dix{x) = P [ K{ - , x)dfi(x) = r K(- , x)dfx(x). 

Jb Jb Jb 

Since PK{-,x) < K{-,x), we must have PK(-,x) = K{-,x) a.e. by 
Corollary 1-40. 

Definition 10-29 : A finite-valued function > 0 is minimal if 

(1) h is regular, and 

(2) whenever 0 < h' < h with W regular, then h! = ch. 



Lemma 10-30: A normalized finite-valued regular function > 0 is 
minimal if and only if it cannot be written as a nontrivial convex 
combination of two distinct normalized non-negative regular functions. 



Proof: If A = -h 03^2 is such a convex combination, then either 
hi or is not equal to h. Since h > cjii, we must have 

cjii — ch ii h is minimal. Multiplying through by tt, we obtain 
Cl = c. Since ^ 0, we conclude hi = h, contradiction. 

Conversely, iih > h' > 0 with h' regular and h' not equal to 0 or h, 
then 



h = {ttW) 



W 

(ttK') 



+ Mh - h')] 



h - h' 
Tr(h — h') 



exhibits A as a nontrivial convex combination of normalized regular 
functions, provided we can prove 0 < irh' <1. If so, then by 
hypothesis the two normalized functions must be equal to each other 
and hence equal to h. That is, h' = {7rh')h. Thus we are to prove 
0 < ttK < 1. Let h'j > 0. Since (7rA)y > 0, choose n so that 
(7rP^)y > 0. Since h' is superregular, W > P^h' and hence irh' > 
ttP^W > {7rP^)jhj >0. A similar argument applied to h — h' shows 
that ttW < 1. 



Definition 10-31 : A point x in S'* is an extreme point oi if the 
function A( • , x) is minimal and normalized. The set of extreme points 
is denoted Let S = S Kj 
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Since is not regular when j eS, no point of S can be in 

and must be entirely contained in the boundary B, The set S is 
the subset of >S* with respect to which the uniqueness theorem will be 
stated. We shall see eventually that /S is a Borel set and that yL{S) = 1 
(compare with Lemmas 10-27 and 10-28). 

If we form an ^-process, we know that C C aS*. The 
following lemma strengthens this conclusion and shows that actually 
^ C S. 

Lemma 10-32: Let A > 0 be a finite-valued normalized P-super- 
regular function. If x is in then 

(1) K^( • , a:) is normalized if and only if Z( • , a;) is normalized. 

(2) • , a:) is regular for restricted to if and only if jfiT( • , a:) is 
P-regular. 

(3) K^( x) is minimal for restricted to if and only if K(-, x) 
is minimal for P. 

Hence B^ = B^ n B^ and C S. 

Proof: Conclusions (1) and (2) follow from the identities 
2 x) = 7tK(-,x) 

2 P^iKHj,x) = PK(-,X), 

both of which use the fact that K(i, x) = 0 if i is not in (see the proof 
of Theorem 10-26). 

Thus in (3) we may assume that K^{ • , x) and K{- ,x) are both regular. 
Multiplying both by the same constant, if necessary, we may assume 
for the purposes of this proof that they are normalized. We shall use 
Lemma 10-30 and show that a nontrivial decomposition exists for 
K{-,x) if and only if a nontrivial decomposition exists for , x). 
In fact, if for i eS 

K{i, x) = -f 

nontrivially, then for i g 

/>(!) ^(2) 

X) = K(i, x)lh, = Cl ^ + C2 
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where the sums over 8^ can be replaced by sums over 8 because 
ie8 — 8^ implies 0 < < K(i,x)jci = 0 or = 0. Consequently 

we may assume that 

hi hi 

for ie8^. That is, = Ap> for ie8^. But h\^^ = Ap) = 0 for 
ie8 ^ 8^, Hence = h^^\ 

Conversely, if for i e 8^ 

K^(i, X) = c^hf^ + c^h\^\ 

then 

K{i, x) — cji^^^hi + cjt^^^hi 

for ie8^. Extend and to be defined for all ie8 by 

setting them equal to zero for i g >S — 8^. The convex sum of them is 
still K(i, x), and they are both regular normalized functions, since 

2 Puhf% = 2 

i jeS^ 

and 

2 = 2 = 1 . 

i ieS^ 

Consequently we may assume that for all i g8. That is, 

= h\^'^ for all i e8^. 

We now begin to derive properties of the set of extreme points. 

Lemma 10 - 33 : If A > 0 is a normalized minimal function such that 

hi = K{i,x)dv{x) 

Js* 

for a Borel measure v with y(8*) = 1, then v is concentrated at a single 
point and that point is extreme. 

Proof: Consider the functions 

as A ranges through the Borel sets with 0 < v(A) < 1. For any such 
A, and are superregular and satisfy nh^ < 1 and irh^ < 1 by 
Proposition 10-20. But 

h — v{A)h^ + v{A)h^ 

with h regular and normalized and with v{A) + v(JT) = 1. Hence h^ 
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and must both be regular and normalized. By Lemma 10-30, 
h = — h^. Thus, if 0 < v{A) < 1, 

v(A)h, = I K{i, x)dv{x) 

for each fixed i. The same is trivially true if v{A) = 0, and it is true by 
hypothesis if v{A) = 1. Hence it is true of all Borel sets. Therefore 
for fixed i, 

K{i, x) = \ a.e. [v]. 

Thus K(i, x) = for all i almost everywhere [i/]. Since v(S*) > 0, 
there is at least one point Xq where it is true. We have 

K(i, ojq) = hi 

for all i. If there were another such point x\ then we would have 
i Xq) = K(-, x'), and hence Xq = x' by Proposition 10-14. There- 
fore the complement of {a^o} has measure zero, or v is concentrated at 
Xq. Now, we know that K{-,Xq) = h and h is normalized and 
minimal. Hence Xq is extreme. 

Lemma 10-34: If A, and are normalized non-negative super- 
regular functions with 

h = 4- 

where > 0 and Cg > 0, then 

Proof: For a typical basic statement Xq = i A x^ = j ax 2 = k, 
we have, by Definition 10-23, 

Pr^[a;o = i a x^ = j A X2 = k] = 

Using the analogous identities for and and breaking up as 
we obtain 

Pr'^[a;o = i a Xi = j A X 2 = k] = Ci Pr'^^^^[a:o = i A x^ = j A X 2 = k] 

-I- C2 Pr'^^^^[a:o = i A Xi = j A X2 = k]. 

Hence the same is true of all statements in and by the uniqueness 

half of Theorem 1-19 we find that 

Vv^p] = Cl + C2 Fv^^^Xp] 

for all statements p measurable relative to 
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Take p to be the statement e E, where is a Borel set of /S*. The 
claim is that E \i and only if x^ e E, and similarly for and 
For each n we certainly have = x^. Hence x^ = x^ when v is 
finite. When v is infinite, we have x^ = because x^ — x^ for all n 
and because S^* is isometric with a subset of S*. Therefore 

Pr'^[a:^ e E] =: ^ E] e E] 

or 

= Glia'*”' + 02^'*'^'. 

Proposition 10-35: Let h = K(- , x). Then x eS if and only if h is 
normalized and p^{{x]) = 1. 



Proof: First let x gS. Then h is certainly normalized, and hence 
the A-process is defined. Suppose we can prove that the ^--process 
disappears with probability one. Then by definition of = 1. 

Hence by Theorem 10-26, 

K{-,x) = £ K{ ■ , y)d,x\y) = £ ^( ' > vW^iv) = 2 N,, ) • 



But K(-,x) = NiJN^^ = 2; By Theorem 8-4, 



N„ 






for all j. 



That is, fJi^({x}) = 1. Thus we are to prove that the ^-process dis- 
appears with probability one. By the remarks following Definition 
8-13, it suffices to show that ^ some/. Take /to be a 

single mass ll(Nj^^h^) at x, and the equality follows. 

Next let X G Bq. Then h is normalized by definition. By Theorem 
10-26, 

K(-,x)= f K{-,y)dy>{y). 

Js* 



In Lemma 10-33 take v to be Then y^dxo}) = 1 for some Xq. 
But then K{‘ , x) = K(- , Xq), and hence :r = :To by Proposition 10-14. 

Conversely, suppose h is normalized and y^{{x}) =1. If x ^S, then 
X G B, and by Lemmas 10-28 and 10-32 (conclusion 2), h must be regular. 
It remains to show that h is minimal. If h = + C 2 h^"^\ then by 

Lemma 10-34 

yh = + C2y^^^\ 

Hence y^^^^ must put all its weight on x, and therefore h = By 

Lemma 10-30 we conclude that h is minimal. 
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7. Uniqueness of the representation 

In this section we retain all of the notations of Sections 3 through 6. 
We are going to prove that is the unique Borel measure concentrated 
on S for which the representation in Theorem 10-26 holds. The first 
step will be to show that is a Borel set of harmonic measure one. 

We denote by the set of points x in aS* for which iC ( • , x) is normal- 
ized. These are exactly the points for which the ^-process with 
h = K{- ,x) has been defined. By Lemma 10-27, is a Borel set of 
harmonic measure one. Since is a Borel subset of >S*, the notion of 
a Borel measurable function on is well defined. 

Lemma 10-36: If ^ is a fixed ^-measurable set in 13, then 
is a Borel measurable function of x in 8^. 

Proof: By Definition 10-23, 

i A = j A a ;2 = *] = 7rtPtjPj^K(k, x), 

and the right side is continuous even for a; in aS^*. Since any cylinder 
set is the countable disjoint union of basic cylinder sets, the function 
Pr^( *^)[co G A\ for ^ G is a denumerable sum of such functions and 
hence is Borel measurable. 

Let ^ be the class of all sets in Q for which Pr^^'*^^[co g A] is Borel 
measurable. If {A^ and are, respectively, increasing and 

decreasing sequences of such sets, then 

Pri^(-.^)[(j ^ j = lim Pr^(-‘^>[^ J 
n 

and 

Pr*r(-.x)[pi ^ Ijjn Pr^f(-.^)[5j, 

n 

the latter equality holding since is a finite measure. Hence 

[J A and Q are both in By the Monotone Class Lemma (see 
Halmos [1950], pp. 27-28), ^ contains 

Lemma 10-37 : If ^ is in ^ and if (7 is a Borel set in 8^, then 

Pr[coeA A x^eC] = j 

Proof: If ^ is the typical basic statement Xq = i A x^ = j a X 2 = k, 
then 

Pr[p A x^gC] = Pr[^] • Pr[x„ \ p] 

— jk ^^klp^v ^ 
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by Theorem 4-9. By Proposition 10-21 and by Definition 10-23, this 
expression is equal to 

= x)dn{x) 

= j*^ ■"iPijPjkKik, x)dn{x) 

_ j* 

By Lemma 10-36 the function Pr^^*'^^[^] is a Borel measurable function 
of a; if ^ is in Fixing C, we may therefore define a set function a by 



a 




Pr^<'*^>[^]d^(a;). 



Then a is certainly non-negative, and it is completely additive by the 
Monotone Convergence Theorem. The calculation above shows that 
the two set functions a(A) and Pr[cu e ^ A x^eC] agree on basic 
cylinder sets and hence on all of the field By Theorem 1-19 they 
must agree on all of 



Proposition 10-38: The set is a Borel set with ijl{S) = 1. 

Proof: Applying Lemma 10-37 to the statement e D, where D is 
a Borel set of /S*, we have for any Borel subset C of Sj^ 

Pr[a;^ e D A x^eC] = f Pr^^’*^^[x^ g D]dfjb(x) = f fjL^^''^\D)dfji(x). 

Jc Jc 

But by definition 

Pr[x^ E D A x^eC] = (x{D nC) = j 

The set on which two Borel measurable functions agree is a Borel set. 
Hence for fixed D the set of o;’s on which 

= xd(^) 

holds is a Borel set in Since these two functions have the same 
/x-integral over all Borel sets C, they must be equal a.e. [/x]. 

We shall let D range over the intersection with of all balls with 
centers in S and with rational radii. Let ^ be the collection of such 
balls and let 



T = {x GaSjv I = Xd(^) ^ 




10-39 



Uniqueness of the representation 



355 



Remembering that is a Borel set of harmonic measure one, we see 
that T is the denumerable intersection of Borel sets of measure one and 
is therefore a Borel set with p{T) = 1. We show T = S. 

First let xe T. Choose in S with d{x, s^) < l/n, and let be 
the intersection with 8^ of the ball with center and radius Ijn. By 
assumption 

^ XdJ^) = 1 

for all n. Hence 

= 1 . 

Therefore x eS hy Proposition 10-35. 

Conversely, if a: g ^8, then 

= 1 

by Proposition 10-35, and so 

for all Borel sets D in 8^. Therefore xe T, and T = 8, 

Lemma 10-39 : Let h be defined by 

= J K{i, x)dv{x), 

where v is a measure with v(8) = 1. Then the ^-process is well defined, 
and for any ^ in ^ 

Fr>^[A] = J FT’^<'--^\A]dv{x). 



Proof: By Proposition 10-20, h is non-negative superregular. By 
Fubini’s Theorem irh = since ttK{ • , x) = \ for all x in 8. Hence the 
A-process is well defined. If ^ is a typical basic statement. 



Xq = i ^ x^ = j ^ X2 = k, 



Vr^lp] — TTiP 




7TiPijPj}^K{k, x)dv{x) 



Pv^^'>^\p]dv{x). 
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Proceeding as in Lemma 10-37, we define a by 

a{A) = j Pr^< -*>[^](Zv(x). 

Then a is a finite measure on ^ which agrees with the measure Pr'^[^] 
on By Theorem 1-19, 

Pr'^[^] = a(A) 

for all A in 



The theorem to follow is the uniqueness theorem mentioned at the 
beginning of this section. 

Theorem 10-40 : If A > 0 is a normalized superregular function such 
that 

hi = j K{i, x)dv{x) 

for some measure v on aS* with p{S) = v{S*) = 1, then v = jx^. 



Proof: By Lemma 10-39, 

r\S) = Pr'‘[a:„ e C n >5] = J £ (7 n 8^dv{x). 

But by Proposition 10-35, 

Pr'f<--^>[x„ £ C n ;S] = r\8) = xcM- 

Hence for all Borel sets G C\ 8 

t,\G r^8) = £ Xcnsi^)d<x) = v(C n 8). 

Since v{S) = 1, we must have ix^{S) = 1 and hence /x^(C') = v{C) for 
all Borel sets C in S*. 



We have not yet proved that the representation in Theorem 10-40 
does hold for the measure but this fact is a consequence of the 
theorem to follow, which will summarize the results of the past three 
sections. 

Theorem 10-41 : The 7r-integrable non-negative P-superregular 
functions h stand in one-one correspondence with the non-negative 
finite measures v on the Borel sets of S, the correspondence oi h to v 
being 

A, = L K{i, x)dv{x) 

and satisfying ttK = v{S). The unique representation h = Nf -h r with 
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r regular arises by decomposing the integral over S into a part over S 
and a part over B^. If A is normalized, then the measure that corre- 
sponds to h is 

Proof: By Fubini’s Theorem any function of the form 

~ j x)dv{x) 

with V finite is non-negative superregular and has irh = v(>5). 

Conversely, let h be given. If irh = 0, then the superregularity of 
h implies that 

0 = ttK > TrP^h > 0 

for all n and hence irP^h = 0 for all n. Thus {TrN)h = 0. Since ttN 
is everywhere positive, A = 0. Thus existence and uniqueness of v 
follow if ttA = 0. Next, let nh be positive. Since h and v must be 
related by rrh = v(S), we may, for both existence and uniqueness, divide 
h by an appropriate positive constant to obtain rrh = \, Uniqueness 
of V and the fact that v = then follow from Theorem 10-40. Exist- 
ence of V follows from Theorem 10-26 provided we can show that 
— S) = By Lemma 10-32, 

CS^*n S. 

Hence 

^h(S) = nS) > 

But the right side equals one by Proposition 10-38. Thus a^(S) = 1 

a„d^«(S.-S).0 _ 

Finally we have S = S U disjointly, and an application of 
Fubini’s Theorem shows that 

K(i, x)dv 

jBe 

is regular. Since 

the representation h — Nf -f r is as asserted. 

8. Analog of Fatou’s Theorem 

In the classical case of the disk, which was discussed in Section 1, 
normalized Lebesgue measure m on the circle has the distinguishing 
property that it corresponds to the function 1. If ^ is the non- 
negative harmonic function corresponding to a measure v and if 
V = fm + fx^ is the Lebesgue decomposition of v with respect to m, 
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then Fatou’s Theorem asserts that for almost every x [m] on the 
circle, h(re^^) ~^f(x) whenever converges to x nontangentially. In 
this section we shall prove a Markov chain analog of this theorem in 
terms of the Martin boundary and the measures given by Theorem 
10-41. As expected, harmonic measure /x will play the role of Lebesgue 
measure. 

Our procedure will be first to derive an almost everywhere statement 
in terms of the measure on the probability space and then to translate 
this statement into a result in terms of harmonic measure. As a 
preliminary to the first step, we consider a special case in the lemma 
below. 

We shall be dealing with expressions of the form lim„_^oo 
this section, where h is non-negative and P-regular, and we shall adopt 
the convention that h(x^(<x))) = 0 if n > v. This definition is moti- 
vated by the following consideration: If P is the enlarged chain for P, 
then a P-regular function h extends to be regular for P if A is defined 
to be zero at the absorbing state; consequently if irh is finite, {h{x^)] is a 
martingale with M[A^(a:o)] = nh. 

Lemma 10-42: If A > 0 is a normalized bounded regular function, 
then jjL^ is absolutely continuous with respect to /x and 

Af = f K{i, x)f{x)dfi(x) = r K(i, x)f{x)dfi{x), 

JS jBe 

where / is the Radon-Nikodym derivative of [jtP- with respect to fx. 
The function / may be taken to be zero on S, and if it is, then 

Pr[lim h{x^) = /(xj] = 1. 

n-+ 00 



Proof: Since A is bounded, A < c1 for some constant c > 1. 
that 






+ 



A. 

c 



Noting 



set = (c — 1) ^(c1 — A). Then gr and A are non-negative normalized 
superregular functions, and Lemma 10-34 shows that 



“ 1 a 1 h 



Thus fjL^ < CfjL and /x^ is absolutely continuous with respect to jx. By 
the Radon-Nikodym Theorem there is a Borel function / such that 

f{x)dix(x) 
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for all Borel sets C. Since h is regular, = 0 and we may take / 

to vanish on 8, 

By Theorem 10-41 we have 



h 



J K{i, x)dyL^{x) 

£ K(i, x)f{x)dfji(x) 

f K{i, x)f(x)dn(x). 
Js* 



Now the argument in the proof of Proposition 10-21 shows that 
f K{i, x)f{x)dn{x) = I* f{x„(w)) dPrj(oo). 

Js* JQ 

Thus if Pr^[a;o = j A • • • A = i] > 0, 

i^v) I ^0 ~ j A * * • A Xy^ = ^] = ~ 

Similarly, if b denotes the absorbing state in the enlarged chain, then 
\xq = j A ••• A x^ = b] = 0 = hf,. 

That is, if is the partition generated by [xq, . . . , then 

I ^n] = A(^n)‘ 

On one hand, the Borel field generated by the is and on the other 
hand f(x^) is measurable over since it is the composition of a Borel 
function and a function for which the inverse image of every Borel 
set is in By Proposition 3-18, 

Pr[lim h{x^) = f(x^)] = 1. 



The general case of the almost everywhere statement in terms of the 
measure on the probability space is covered by the following theorem. 

Theorem 10-43: Let A > 0 be a normalized regular function and let 
+ Ms be the Lebesgue decomposition of with respect to /x 
(where /is taken to be zero on the state space 8); then 

Pr[lim h(x^) = f(x^)] = I, 

n -* 00 

Proof; If we let I: = + ^h, then I: is a non-negative normalized 

superregular function. Since k is strictly positive, we have S'‘ = 8 
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and = aS*. By Lemma 10-32, The function hjk is a 

bounded regular function for the fc-process with 




TTihi = 1 , 



and hence Lemma 10-42 yields a function g with 



(^)i ^ 

and 

^ - »«>] - '■ 



As was pointed out in the proof of Lemma 10-34, is identical with 
and also 



Pr^[^] = 1 Pr^[p] -h i Vr\p] 



for all p. Thus Pr^[^] cannot be one unless Pr^(^) and Pr^^[^] both 
equal one, and we conclude that 



or 



Pr lim 



H^n) 




1 



Pr lim 



h + hH^n) 




= 1 . 



Since {A.(x„)} is a non-negative martingale, lim h{x„) exists a.e. [Pr] and 
is finite. Therefore the above identity implies that 

9'(^v) 



Pr 



lim h{x„) = 



= 1 . 



2 - SrK)] 

Thus to complete the proof, it suffices to prove that 

g{x) 



fix) = 



a.e. [/x]. 



2 - g{x) 

First we identify g as the Radon-Nikodym derivative of fi^ with 
respect to /x^. On one hand, we have 



^ K^(i, x)g{x)dfji^{x) 

= J K{i, x)g(x)dfi^(x). 
On the other hand. Theorem 10-41 gives 

= J K{i, x)d(i^{x), 
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and thus the uniqueness part of Theorem 10-41 gives 
Now, by Lemma 10-34, we have Hence 

/m + Ms = = S'm'" = kM + = kM + k/M + iS'Ms 

or 

(2/ - S' - /S')/^ = (S' - 2K- 

Since fi and are singular with respect to each other, each side is the 
zero measure. For the left side, this statement means that 



2/ - S' - /S' = 0 a.e. [/x] 
or 



The corollary to this theorem is the analog of Fatou’s Theorem; it is 
a translation of the theorem into a result in terms of harmonic measure. 
The statement of the corollary needs a way of singling out for attention 
a single point x of S*. One way of proceeding is to use the K{- ,x) 
process, at least if a; is in S'; for in that case Proposition 10-35 shows 
that 

= x] = 1. 

Corollary 10-44: Let ^ > 0 be a normalized regular function, and let 
= fjji + iJL^ (with /equal to zero on S) be the Lebesgue decomposition 
of jjL^ with respect to /x. Then for almost every x [/x] for which K(-, x) 
is normalized, 

Pr^(-.^)[lim h{x^) = f{x)] = 1. 

n-+ 00 

Proof: By Theorem 10-43, 

Pr[lim h(x„) = /(x„)] = 1. 

Then by Lemma 10-39, 

1 = Pr[lim h(xj = f(x^)] = £ Pr^' -^>[lim h(xj = f(xj]dix(x). 
Since 

Prif(-.x)[iim h{Xn) = /(xj] < 1 

for every x, equality must hold for almost every x [/x]. But 

Vr^^'-^\Xy = x] = I 
or 

Pr^^-->[/(.Tj = fix)] = 1 

for almost all x in S. Since /x(S) = 1, the corollary follows. 




362 



Transient boundary theory 



The results above clearly extend to all 7 r-integrable regular A > 0 
provided we replace everywhere by the unique measure v associated 
to h by Theorem 10-41. 

Since the function / in all three of the above results is equal to zero 
on S, we may think of / as a function on If / is so restricted, it is 
called the fine boundary function of h. 

9. Fine boundary functions 

The results of Section 8 may be extended to non-negative 7r-integrable 
superregular functions with the help of the proposition below. Our 
convention that A(o;„(co)) = 0 if n > v(co) is still in force. 

Proposition 10-45: If gr is a 7r-integrable function of the form Nf with 
/ > 0, then 

Pr[lim g(x^) = 0] = 1. 

n -* 00 

For almost every x \jjl] for which K{- , x) is normalized, 

Pr^(*'^)[lim g{Xn) = 0] = 1. 

00 

Proof: In the first statement, g is non-negative superregular and ng 
is finite; hence is a non-negative supermartingale and g{Xj^) -> 

2 > 0 a.e. [Pr]. Now g > P^g for all n, and P^g 0. By dominated 
convergence nP^g -> 0. Thus 

M„[z] < lim M„[g'(a:„)] = lim (7rP"gf) = 0 

and 2 = 0 a.e. [Pr]. The proof of the second statement in the proposi- 
tion is the same as the proof of Corollary 10-44. 

Thus if A > 0 is TT-integrable and superregular, we may write, accord- 
ing to Theorem 5-10, h = Nf -h r, with r regular and Nf and r both 
TT-integrable and non-negative. Corollary 10-44 and Proposition 
10-45 may therefore be combined into a single result whenever 
necessary. 

The fine boundary function / of a normalized minimal regular func- 
tion h takes on an especially simple form. By Lemma 10-33, we must 
have h = ,y) for some y in Bg, and, by Proposition 10-35, 

m"({2/}) = 1. 

There are two cases. First, if /x({y}) = 0, then is singular with 
respect to y and the fine boundary function of h is zero. By Lemma 
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10-42, h is unbounded. Second, if /x({i/}) = a > 0, then the fine 
boundary function may be taken as 1/a at ^ and 0 elsewhere, since 

= Xciy) = 

for all Borel sets C. Moreover, h is bounded by 1/a since 

1 > Pri[a:^,(co) = y] = i K(i, x)dyi{x) = aK{i, y) = a^. 

J{y) 

The class of 77-integrable regular functions A > 0 with a given 
/x-integrable non-negative function / as fine boundary function is 
exactly the class of functions h for which 

V = fn + n, 

is the Lebesgue decomposition with respect to /z of the measure v 
associated to h. Thus the class of such functions h is exactly the class 
of functions of the form 

Aj = K{i, x)f{x)dtx{x) + K{i, x)dti,{x), 

J Be J Be 

where is any Borel measure singular with respect to fx. In this class 
there is a unique smallest such function 

hi = { K(i, x)f(x)dfx(x). 

J Be 

On one hand. Theorem 10-43 gives lim h{Xj^) = f(x^) a.e. [Pr], and 
hence 

M,[liinA(x„)J = M„[/(;r„)] = f /(x)d^(x) = nh. 

J Be 

On the other hand, 

lim Mjfh{Xn)] = lim irP^h = irh. 

If hj > hj for some j, choose n so that (7rP”)y > 0. Then 
Trh = irP^h > irP^h = irh. 

Thus by Proposition 1-52, {h{x^)] is uniformly integrable if and only if 
h = h. 

There is a different topology for aS* which is occasionally referred to 
in connection with fine boundary functions. The fine topology for S* 
is defined in terms of its neighborhood system as follows: For any 
point X inS* — every set in S* containing x is to be a neighborhood 
of X. For X in B^, the neighborhoods of x are the sets A in S* such that 
a; is in ^ and 

Pr^('*^)[a;„ G A from some time on] = 1. 




364 



Transient boundary theory 



Evidently x is in each of its neighborhoods, the intersection of two 
neighborhoods is a neighborhood, and a superset of a neighborhood is a 
neighborhood. We must check that /S* is a neighborhood of x\ that is, 
that the i?L( • , x)-process disappears with probability zero. But this 
fact is a consequence of Proposition 10-35. 

If X is in jBg and if A is an open set containing x in the metric 
topology of /S*, then 

1 > G A from some time on] 

> A] = = 1. 

Therefore A is open in the fine topology, and the fine topology is 
stronger than the metric topology. 

The next lemma and proposition show that a zero -one law holds for 
the probabilities which define the fine topology. The lemma by itself 
is of value in checking whether a non-negative regular function is 
minimal. 



Lemma 10-46: A non-negative normalized regular function h is 
minimal for P if and only if the only bounded regular functions for 
(restricted to S^) are constants. 



Proof: If h is minimal, then the regularity of h implies that 1 is 
regular for P^ restricted to 8^. By adding a suitable multiple of 1 to a 
given bounded regular function for we may assume that it is 
a non-negative bounded regular function for P^. Thus let h with 
0 < ^ < c1 be a regular function for P^ restricted to 8^. Set 

if i e 8^ 

lO otherwise. 



Then Ic is P-regular and satisfies 0 < < ch^. Since h is minimal, 

k^ = c'h^ for all i and hence = c' for i in 8^. That is, h = c"\. 
Conversely, if Ji is not minimal, find a regular function k with 
0 < k < h and k / ch. Set 

h- = kjh^ for i g 8^. 

Then ^ is a non-constant regular function for P^ restricted to 8^. 



Proposition 10-47: If x is in Pg, then, for any subset A of 8, 
Pr^(‘‘^)[o:^ E A from some time on] = 0 or 1. 
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Proof: The A-process for h = K{-,x) disappears with probability 
zero, and thus 



Pr^<-*^>Ke/S] = 1. 



Therefore 



Pr^( *^)[x^ G A from some time on] 

_ eS — A only finitely often] 

= 1 — E S — A infinitely often]. 

By Lemma 10-46, the only bounded regular functions for this process 
are the constants, and thus, by Proposition 5-19, this last expression 
must equal zero or one. 



Proposition 10-47 shows that fine neighborhoods of x in are those 
sets A in S* for which x is in ^ and 



Pr^( *^)[a;^ g A from some time on] > 0. 

The complement of a fine neighborhood of x is called thin at x. Such 
sets A are characterized by the property 

Pr^( ‘^)[a;^ G A infinitely often] = 0. 



By Proposition 10-47 this probability must again be zero or one. 

Let A > 0 be a 7r-integrable regular function with fine boundary 
function/. The fine topology for has the property that the function 
h U f obtained by extending h to aS* by / is continuous at almost every 
[/x] point X in aS*. In fact, the statement is trivial for x not in and 
it therefore suffices by Corollary 10-44 to prove the result for every 
X in Bq for which 

Pr^(‘*^)[lim h{x^) = f{x)] = 1. 

Let a < f{x) < b. We are to produce a fine neighborhood of x such 
that 

a < (h Kjf){y) < h 



for all y in that neighborhood. Let A be the set of points of S for which 
a < h < b and form the set A U [x]. We shall show this is a neighbor- 
hood of X. The convergence of h{Xj^) to f{x) implies that 



or 

Therefore 



Pr^^ *^)[a < h{Xn) < b from some time on] = 1 
Pr^( G A from some time on] = 1. 
Pr^('*^)[x^ E A U {x} from some time on] = 1 



and A U {a;} is a fine neighborhood of x. 
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10. Martin entrance boundary 

The Martin exit boundary was defined in terms of P and a vector 
7 T > 0 such that ttN is strictly positive. The completion of S in the 
metric d served for a representation of all rr-integrable superregular 
functions A > 0 in terms of finite measures on this space. 

In this section we shall introduce a different completion of S 
which will serve for the representation of P-superregular measures. 
As the analog of tt, we fix once and for all a function / > 0 such that 
Nf is everywhere positive and finite-valued. The representation 
theorem will be for superregular measures cr > 0 for which of is finite. 

The formalism is as follows: For i and j in S we define 



J{hj) 



N 



a 



(Nf)i 



Then the measure J{i, • ) is P-regular everywhere except at i, where it 
is strictly superregular, and it satisfies J(i, •)/ = 1 l^or all i. For each 
j the function is bounded by Njjl(Nf)j- because, if g is defined as Nf, 
then 



We define 



- 9 . ' 9 ^ - 9 ^ ~ 9 ^ 



i 



where the are positive weights such that 2 a is finite. The 
bound we have just computed shows that d' is finite- valued, and d' is a 
metric if we can show that d'(j,j') = 0 implies j = j'. But if 
d'(jj') = i^hen 

J(j, •) =J(j\ •). 



Multiplying through by P and supposing that j ¥= j', we obtain 

i i 

the strict inequality holding since J(j\ •) is not regular at j'. Thus 
j = j' and d' is a metric. 

We define *S to be the Cauchy completion of S in the metric d ' ; the 
set *S — Sis the Martin entrance boundary. A sequence {j^} is Cauchy 
in S if and only if the sequence {J(jn, 0} is Cauchy for every i. Con- 
sequently J('yi) extends to a continuous function on *S. Then 
is Cauchy in *S if and only if {J(x^, 0} is Cauchy for every i. 
Two points X and y are equal if and only ii J{x, • ) = J{y, • 
space *aS^ is compact. 

A P-superregular measure a is normalized if o/ = 1 . It is minimal if it 
is regular and if 0 < a < cj with a regular implies 5 = ca. Application 
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of Fatou’s Theorem shows that J(a:, •) is a P-superregular measure 
for each x and that it satisfies J(x, ')f < \. A point x of *aS is extreme 
if J(x, • ) is minimal and normalized, and the set of extreme points is 
denoted P®. Then n aS is empty. 

The next theorem is the Markov chain analog for the Poisson- 
Martin Representation Theorem for P-superregular measures. 

Theorem 10-48: The sets S and are Borel subsets of *aS. The 
non-negative P-superregular measures a with of finite stand in one-one 
correspondence with the non-negative finite measures v on the Borel 
sets of aS U the correspondence of cr to v being 

cTf = J {x, i)dv{x) 

JsuB® 

and satisfying af — v{S U P^). The unique representation a = 
yN + p with p regular arises by decomposing the integral over S 'U B^ 
into a part over S and a part over B^. 



The proof will be accomplished by using duality, but we shall isolate 
several of the steps beforehand. Let a > 0 be a superregular measure 
such that ccf = 1. Duality in the remainder of this section will mean 
oc-duality. Let P = dual P and tt = dual/. Since 



0 < dual g = dual (Nf) = ftS' 

and 

1 = a/ = (dual /)(dual a) = ttI, 



the exit boundary of P relative to rr is defined. Let d be the defining 
metric of the exit boundary, and let a§* be the completion of S under 
d. We have 



dual •) = 



(iV/)y«. 







Lemma 10-49: If the same weights are used in defining d and d' , 
then the identity map on S extends to an isometry of (*aS, d') onto 
(S*,d). 



Proof: We have 

d'UJ') 



= 2 i) - »)i 



i 
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Under the identification of *S and S*, 

J{x, •) = dual a:) 

by the continuity of the functions J( • , i) and R(i, • ). Under duality a 
j^-superregular function h corresponds to a P-superregular row vector 
a = dual A. Since P = dual P and tt = dual/, regularity and 
normalization are both preserved. But minimality is also preserved 
since duality preserves inequalities. Therefore — 8^. 



Proof of Theorem 10-48: Let h = dual o-, and apply Theorem 10-41. 
Then 

x)dv{x) 

J SKJBe 

with frh = v(S U J§g). Therefore 



-- f 

JS^Be 



— J (x, i)dv{x) 

CCi 



and af — v(S U If we identify B^ and and call v by the same 

name on B^ as on then we have 



cTf = J(x, i)dv{x) 



as required. Conversely, if o is given by 



J(x, i)dv{x), 



then 



= J 

Js\ 

(dual or)i = ^ i?(i, x)dv\ 

J Bg 



Therefore dual a is P-superregular and o- must be superregular. Since 
there is only one measure which represents dual a, there can be only 
one measure which represents ct. The last statement of the theorem 
follows from the fact that the unique decomposition a — yN -f p is 
transformed under duality into the unique decomposition of dual a 
in the P-process. 

If or > 0, we can take a = cr and then the proof of Theorem 10-48 
gives us the following interpretation iov v w is harmonic measure for 
the chain with transition matrix a-dual P and starting distribution 
(j-dual /. 



11. Application to extended chains 

In order to assign a probabilistic interpretation to the entrance 
boundary, we must pass to extended chains. Except where noted, 
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we therefore assume throughout this section that {z^} is an extended 
chain with state space aS U {a} U {6}, with (enlarged) transition matrix 
P, and with vector v of mean times in the various states of S, Let tt 
and / be row and column vectors, respectively, such that tt > 0 
/ > 0, 7t1 = 1, v/ = 1, ttN > 0, and Nf > 0. By Theorem 8-3, Nf is 
finite- valued. We continue to use the notations of Sections 3 through 
10 for boundary theory of P relative to tt and /, and we shall use the 
vectors /x^ and defined for extended chains in Section 2. 

Our first pair of propositions will give the “final” and “starting” 
distributions for the extended chain. If x is in *S and if E ranges over 
finite sets in S, we define 

p(x) = lim J(x, -)e^. 

Similarly, if y is in aS*, we define 

q''(y) = lim ix^K{-,y). 

E1S 

We first show that these limits always exist. 



Lemma 10-50: For every x in *S and y in aS*, both 
exist, possibly being infinite. Their defining limits 
limits as ^ I aS. Each is a Borel measurable function 
and the functions satisfy 



and 



p{x) = lim 

EtS 



lim 



hf 



X (Nf)j 



p(x) and q'’{y) 
are increasing 
on its domain. 



q^y) = lim 

E1S 



lim 

i^y (■7rN)j 



Proof: For any finite set E, 

hf = 2 

keE 

and 

Since E is finite, we may pass to the limit as j — > ic to get 

hE 

For each fixed j, the expression hfl(Nf)j increases with E, and hence so 
does the limit as j x. Therefore p(x) exists, is defined by an increas- 
ing limit, and has the asserted value. If we choose an increasing 
sequence of finite sets E^^ with union S, then p(x) is exhibited as the 
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limit of a sequence of continuous functions and is therefore Borel 
measurable. 

Similarly we have 

= 2 

keE 

and hence 

Thus the same argument applies to q'^(y), since increases with E. 

Proposition 10-51 : On almost every path, 

z^{(jj) = lim Zn((D) 

n t V 

exists in S*. On almost every path for which the limit exists, either 
V < +00 and z„ g S or else v = +oo and z,, e Moreover, for any 
Borel set C in /S*, 

Pr[z„eC'] = ^^q'’(y)dti,{y). 

Proof: The process watched starting in ^ is a Markov chain with 
transition matrix P and starting distribution /x^ if JS7 is a finite set in S. 
By Proposition 10-21, 2 ;^ exists and has the required values for almost 
every path in this chain, and in this chain 

Pr„s[z„ eC]= 2 Mf Prj[z„ e G] 

ieE 

K(i, x)dfji{x) 

= J fjL^K(- , x)dfji(x). 

As E increases to S, we obtain almost all paths of {z^^} in this way. 
Hence z^ exists and is in /S' U a.e., and, by Definition 10-4, 

Pr[z^eC] = lim f , y)dfM{y), 

EtS Jc 

By Lemma 10-50, the integrand increases with E. Replacing the sets 
E by an increasing sequence E^^ and applying monotone convergence, 
we obtain 

Pr[ 2 :^ e C] = f lim • , y)dix{y) 

Jc E1S 

= q'’{y)dn(y)- 
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Proposition 10-52: On almost every path, 

z„(o>) = lim z„(o)) 

n 1 u 

exists in On almost every path for which the limit exists, either 
u > — 00 and z^eS or else u = — oo and g Moreover, for any 
Borel set C in */S, 

Pr[ 2 „ 6 C] = J p(x)dfji‘{x), 

where is the unique measure on S \J representing v. 



Proof: Form the reverse process and call its measure Pr'. Its 
transition matrix restricted to the states of aS is the i^-dual of P, denoted 
P. Form the exit and entrance boundaries of P relative to tt = v-dual / 
and / = v-dual tt. By Proposition 10-51, the limit 

2 :^ = lim 

V 

exists in S* a.e. [Pr'j, and either v is finite and z^e S or else v is infinite 
and a.e. e Therefore 



2 ^ = lim 2 ^ 

n 1 u 

exists in §* a.e. [Pr], and either u is finite and z^e S or else u is infinite 
and a.e. z^ e Since S* and *S are canonically identified according 
to Lemma 10-49, the first part of the proposition follows. Moreover, 
we have 

Pr[2^G(7] = Vr'lz^eC] = f - , x)]dp^(x), 

Jc ElS 

where is the measure on S* which represents 1 , that is, the measure 
on *S which represents v. The above expression is 



I 



lim [J{x, •){dusA p^)]dp^(x), 

EtS 



But p^, by Proposition 10-8, is the balayage charge of v on P and its 
dual is thus the balayage charge of 1 on P; that is, the charge e^ of h^. 



We thus obtain ^ p(x)dp^{x) and lq"^(y)dp{y) as starting and final 
distributions for the extended chain. The fact that v is normalized 
(i/ = 1) implies that p^ is a probability measure. Note that p{x) 
depends only on P, tt, and / and not on the extended chain or even the 
vector V, but that q'^iy) depends on P, tt, /, and v. We shall return to 
this point shortly. 
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Conversely, if we select any probability measure on S U B^, then 
Theorem 10-48 yields a unique normalized P-superregular measure a 
represented by If a is positive, Theorem 10-9 assures the existence 
of at least one extended chain with o- as its vector of mean times. Any 
such chain starts in C with probability 



p(x)dfx^(x). 

By varying we may change the total measure on { 2 ^}, even as to 
whether it is finite or infinite. 

We consider two special cases. First, if 0 is in S, let 

cry = J(0J) = Nojl(Nf)o. 

By the uniqueness of the representation we must have fx^({0}) = 1, and 
hence we may represent a (provided it is positive) by an extended chain 
which has a.e. path starting at 0. The total measure of the process 
must be 

I* p(x)dijF{x) = ^j(0K({0}) = ^9(0), 

J *s 



and so the interpretation of gj as the mean number of times in state j 
shows that 

CTy = p{0)Noj. 

As a check, we can compute ^(0) directly. We have 



hE 1 



as required. 

Second, choose a = J(x, •), where x is in Then ix^({x}) = 1, 

and (if a is positive) an extended chain representing a in the sense of 
Theorem 10-9 must start a.e. at x with total measure p(x). Therefore, 
J{x, • ) may be interpreted as the vector of mean times for any extended 
chain which is started almost surely at x, and, if p{x) is finite, then 



p(x) 



is the conditional mean of the number of times in the various states 
given that the process starts at x. 

Returning to the case of a general v and supposing that p{x) is finite 
a.e. [p,^], we see that the identity 



may be interpreted as follows: vj may be computed by choosing a 
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starting state x according to the starting distribution and then weight- 
ing it by the conditional mean of the number of times in j given that 
the process starts at x. 

As we noted earlier, there is an asymmetry between p{x) and q^{y) 
in that one of these quantities depends on v and the other does not. 
This asymmetry is cleared up by a discussion of A-processes for P; we 
shall see that p{x) depends on h and that q^(y) does not. The special 
case we have been considering so far will be seen to be the case A = 1 . 

In discussing the entrance boundary for where h > 0 is super- 
regular and normalized, we choose 

ft = fith 

in analogy with the choice of Define a by 

fff = Vihi. 

Then o is positive, is P'^-superregular, and satisfies = 1. 

Direct calculation shows that 



(T-dual {P^) = v-dual P = P 

and that 

(T-dual (f^) = v-dualf = n. 

Therefore the entrance boundary for P^ and P is the same as that for 
P and/. 

Now let {z!^} be an extended chain representing P^ and a (existence 
by Theorem 10-9), and take tt^ smdf^ as the functions relative to which 
the exit and entrance boundaries are formed. The process starts in *S 
and goes to S^* = /S*. To obtain the starting and final distributions, 
we need the functions p and q. For p we have 



For q we have 



p^{x) = lim lim 

£ t S j -*x 



{NTh 



= lim lim 

j-^X 



1 m. 

keE 

{mihi 



= lim lim 

E^S j-^x 



9i 



q‘’-\y) = lim ■ , y) 

EtS 



= lim y yfhiK(i, y)jhi 

E\S ieE 

= 9'’(y)- 
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Therefore, in the chain we have as starting and final distributions 

g C] = J p^(x)dyy(x) 

and 

Pr'‘[z; e C] = q'>(y)dfi^y). 

These relations show the symmetric roles of p and q and give us sym- 
metric interpretations for and 

In the special case of positive minimal v and h, we must have 

V = J (Xq, • ) 

and 

h = K{‘,yQ), 

where Xq and y^ are in and respectively. In this case 

= m"((2/o}) = 1- 

Thus the process {z'^} has almost every path starting at Xq and going to 
yQ. The paths which start at Xq have measure p^(Xq), and the paths 
which go to b have measure q^(yo). Therefore we must have 

pH^o) = <f(yo)- 



12. Proof of Theorem 10-9 

(1) Construction of the extended stochastic process. Let 
P = Ps^ib) and P = Pg. 

Redefine v as being restricted to the domain S. Let be the balayage 
potential of v on a finite set E with the balayage charge. For each 
E we shall consider a Markov chain with transition matrix P and 
starting distribution and we shall combine these into a single 
extended stochastic process by choosing a common time scale. For 
the sake of convenience we assume that S = {0, 1, . . Our measure 
space will, as usual, be the double sequence space obtained from S, a, 
and b. To define the process {Xj^} it is sufficient to assign probabilities 
consistently to basic statements. 

We shall use [y^ to denote the outcome functions for the Markov 
chain P with various as starting distributions. These vectors are 
finite measures, but not necessarily probability measures. Since S is 
the set of non-negative integers, it has a natural ordering on it. For 
each path to in the ordinary sequence space of P, let s(to) be the smallest 
numbered state on path co and let t(to) be the time that s(to) is first 
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entered. (Then s and t are defined everywhere except on the one 
path ( 6 , 6 , b, 6 ,. . 

There are two kinds of probability assignments to be made on the 
basic cylinder sets. First, Hi eS, define 

Pr[x„ = i A = j A . . • A + = k] 

= Pv[2/^ + n = i A yt + n + i = j A • • • A yt + n + m = 

where E = {i} and where the right side is taken as 0 for the set of co 
on which t(o>) + n < 0. Second, define 

Pr[x_j^_^ = a A • • • A iC-n -2 = « A x_^_^ = a 

A x_^ = i A x_^^^ = j A • • • A + = k] 

= Pi Prf[2/i = j A - • A ym = k A t = n]. 

The effect of these definitions is to fix a time scale with this property: 
For each path there is a designated state such that the first entry into 
that state occurs at time 0. Then for i e S, Pr[x„ = i] is finite for all n. 
To show that {x^} is an extended stochastic process, we must check that 
the above definitions on cylinder sets are consistent. That is, we must 
show typically that 

2 = » A + i = j A + 2 = k] = Pr[a;„ = i a + ^ = j], 

k 

when i and j do not both equal a, and that 
(*) ^ = 3 ^ ^-n + 1 = *] 

i 

= Pr[^-n = j A + i = k], 

when j and k do not both equal b. The first identity is immediate 
from the above definitions. But to prove the second identity, we need 
some alternate expressions for the left side. 

Let i E S, let E = {i}, let F be a finite set with E C F, and let q be 
the statement + ^ = i a yt + n + i = j A yt + n + 2 = % ^he first 

definition above, 

Pt[x^ = i A x^^^ = j A x^^2 = k] = Pr^£[g]. 

We shall show that 

Pr^F[g] = Pr^^[g]. 

Since the truth of q depends only on events after the time when E 
is reached. Theorem 4-10 gives 

= 2 Pr„[g] 

meE 

= 2 Pr„M. 

meE 
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Now since and /x^ are balayage charges (see Section 2). 

Therefore 

= 2 Pr„[g-] = 

meE 

It follows that 

FIS 

which is the first of the two alternate identities we need. 

Next, we note that v(I — P) = /jl hy Theorem 5-10. Hence /x^ = 
limj ?|5 /xf by the property of balayage charges that they converge to 
the product of the given superregular measure and I — P. Thus 

Pr[x_n-2 = a A ^-n -1 = a A x_^ = i A + ^ = j A x_n + 2 = *] 

= H Pri[2/i = j A ^2 = * A t = ri] 

= lim /xf Pri[^i = j A ^2 = * A t = 7i] 

F 

- lim Pr„F[i/o = i A 2/i = j A 1/2 = * A t = n], 

F 

which is our second alternate identity. 

Now we are in a position to prove (*). In the proof we shall use the 
abbreviation 



7 = lim 2 2 Pr„F[i/„_i = i A = j A = k 

m = l isS-F 

A i = m + n'\. 

We begin by using our two alternate identities and by performing a 
direct calculation. Recall that i cannot equal 6, since j ^ h. 

2 Pr[x_„_i = i A = j A = k] 

ieS 
ori = a 

= lim 2 Pv[2/(-n-i = i A yt-n = j A yt-n + i = *] 

Fts ttr 

+ lim Pr„F[i/o = i A i/i = fc A t = n] 

FIS 

= i 2 2 Pv[2/m-i = i A i/„ = j A i/„ + i = kAt = m + n] 

^ Vm = l ieF 

+ Pr„^[2/o = J A yi = k A t = n] 

00 

= lim 2 Pv[2/m = i A y„ + i = kAt = m + n]- Y 
F m = 0 

= limPv[i/(_„ =j A yt-n + i = k] - Y 

F 

= Pr[a^-n = j A x_„ + i = i] - 7. 




10-52 



Proof of Theorem 10-9 



377 



Thus we are to prove F = 0. Now 

00 

0 < r < lim 2 2 ^ A = j] 

^ m = l iES-F 

= lim 2 2 

P m = l ieS-F 




= lim 2 W'-Po 

^ i^F 



< lim 2 ^iPii- 

Since 2i ^iPu is convergent, the right side is zero. Hence F = 0. 
Therefore {x^^} is an extended stochastic process. 

(2) Verification of some of the properties of an extended chain. We 
must now check that {x^} satisfies the four properties of an extended 
chain. First we check (2). Consider the special case in which E is the 
finite set {0, 1, 2,. . ., i}. This set has the property that on any path 
beginning in a state of E the statement i > n implies the statement 
that there is sm m > n with e E. Therefore 

lim Pr[a:_^ e E for some Z > n] 

n 

00 

= lim 2 Pr[x_, e E A + i A • • • A x_^ ^ E] 

n l = n 

00 

= lim 2 P^AVt-i eP A • • • A «/(_„ ^ E] 

n i = n 

= lim Pr^£[i/f_i G E for some I > n'\ 

n 

< lim Pr^E[t > n] 

n 

< lim Pr^£[(3m) with m > n and g E\ 

n 

Since P is transient and /x^ is finite, the right side is zero. Hence so is 
the left side. By Corollary 1-17 we conclude that 

Pr[u£; = — oo] = 0. 

For the general case of a finite set F, choose a finite set E of the above 
form which contains it. Then 

Pv[up = — oo] < Pr[u^ = — oo] = 0. 



Hence (2) holds. 
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Now suppose for the moment that we have proved (3) in the form 
that for any finite set E the process watched starting in is a Markov 
chain with starting distribution and transition matrix P, and suppose 
we have identified the given measure p as the vector of mean times in 
the various states. Then (4) follows, since v was assumed finite-valued. 
To prove (1) that the domain of u^; has positive measure, let j eE. 
Then 

0 < Vf = vf = 2 
ieE 

Hence /xf > 0 for some i e E. But /x^1 is the measure of the set on 
which E is ever entered for the first time. That is, it is the measure of 
the set where > — oo. Hence (1) holds. 

Thus we are to prove that the process watched starting in the finite 
set ^ is a Markov chain with starting distribution fi^ and transition 
matrix P, and we are to identify v. Suppose we have proved this 
assertion for all finite sets of the form E = {0, 1, 2, . . . , e}, and suppose 
F is an arbitrary finite set. We show first how the assertion follows 
for F, Choose a set E of the special form containing F. Then watch- 
ing the process beginning in F is the same as watching the process 
watched starting in E beginning in F. Thus since the P-process is a 
Markov chain, so is the P-process by Theorem 4-9. The starting 
distribution for the P-process is 

= j] = 2 = i] = 2 

ieS ieS 

Hence we may assume that P is a set of the form P = {0, 1, . . . , e}. 
Let {Zj^} be the outcome functions for the process watched starting in P. 
We shall compute the probability of the typical basic statement p 
defined as Zq = i A Z;^ = j A Z 2 = k, where ieE, by considering 
separately the contributions from paths of with u > —00 and 
paths with u = - 00 . The notation E will mean S — E. 

(3) Contribution to Pr[p] from paths with \x > — 00 . The contribution 
to Pr[^] of paths with u > — 00 is 

00 00 

2 2 = « A eE A ■■■ A e£ A x_^ = i 

n = - 00 m = 0 

A + i = j A a;_„ + 2 = k] 

00 00 

= 22 eE A ■■■ A ym-i eE A y,n = i A y„ + i = j 

n = - 00 m = 0 

A ym + 2 = k A t = m + n]. 

The fact that E is of the special form (0, 1, . . . , e} means that if 
yo(u>)eE A ■■■ A y„_j,(oj)eE A yjoj) = i, 
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then since i e E we must have t(o>) > m. Hence we need sum on n 
only over n > 0. If > 0, then for any o> for which • • • > 2/m-i 
not in E, we have that t(oj) = m + n if and only if t(co^) = n. There- 
fore we may apply Theorem 4-9 to Pr[ — ]. In symbols the argument is 
that the contribution is 

= 22 e£ A ■■■ A t = m + n] 

n ^ 0 m ^ 0 

= 22 e£ A ■■■ A eE A y^ = i] 

n^O m^O 

X Pri[2/i = j A t/2 = ^ A t = »]. 

Let denote the matrix of probabilities of entry to E at time m. 
The above expression is 

= 22 = j A ^2 = ^ A t = w] 

n^O 0 

= 2 = i A t/2 = A: A t = w] 

n ^ 0 

= Pr([2/i = j A 2/2 = *] 

= ^jP 

Thus the contribution is 

(4) Contribution to Pr[p\ from paths with u = — oo. The contribution 
to Pr[p] of paths with u = — cx) is 

C30 

2 lim Pr[x_n-m ^E A • • • A eE A x_„ = i 

n= - 00 m-><» 

A X_„+i = j A X_„+2 = 

= 2 lim 2 Pr[x_„_„ = s A + i eE A ■■■ A x_„ + 2 = 

n ^ sef 

= 2 2 PV[2/(-m-n = 5 A yt-m-n + 1 eE A ■■■ 

n ^ se£ t S 

A 2/(-„ + 2 = *] 

00 

= 2 2 2 Pv[2/i = 5 A 2/, + i A • • • 

n ^ S6jg ^ Z = 0 

A yi + m + 2 = kAt = l + m + n]. 

In the expression Pr„r[ — ], for fixed n we may assume that m is large, 
say m > \n\. Then t > Z. Now since E is of the special form 
{0, 1,. . e} and since i e E, state i is lower in the ordering on 8 than 
any of the states of E. Hence 

yi(co) = s A yi + i(oj) eE a • • ■ a t(cu) = I + m + n 
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for m > n\i and only if 

yo(co) e ^ A • • • A e 5 A yi(w) = s 

A 2/i+i(a») eE A • ■ • A t(o>) = I + m + n, 

where (? = {0, 1, . . . , i}. In the presence of the information 

yo((x}) G ^ A • • • A yi-i{o)) eQ A yi(o)) = s, 

it is true that t(co) = i + m + n if and only if t(co/) = m + Theorem 
4-9 now applies. The contribution to Pr[p] therefore is 

= 2 lim 2 2 e <? A • • • A y,_i e ^ A y, = s] 

n ^ ^ 1^0 

X Pr 5 [yi eE A ■■■ A y „+2 = k A t = m + n]. 

Next, we operate on the second factor. In the presence of the 
information yo{oj) = s A yi(oj) g^ A • • • A ym-ii<^) A ymi^) = h 
we know by the special form of E that t(o>) = m + n if and only if 
= n. Hence, by Theorem 4-9, we have 

Prs[i/i sE A ■ ■ ■ A y „+2 = kAi = m + n] 

= PfsCj/i eE A ■■■ A y„ = i]-Prj[«/i = j a ys = k A t = n]. 

The second factor on the right side of this last equation does not depend 
on m and therefore factors out. We may sum it on and we get 
Hence the contribution is 

= lim 2 lim 2 eO A ■ ■ ■ A yi.^&G A yi = s\ 

seg f 1^0 _ 

X Prs[2/i eE A ■■■ A y^ -= i]P,yP,7r 

The first factor here is 

2 e <5 A • • • A I/,_1 e (5 A y, = s] = jtxf + 2 ] 

1^0 l>0 

= /xf + 2 

l>0 

= 1.^ + ^NP),. 

Applying the identity °N — N — B^N of Lemma 8-17, we have 
a^p ^ _ b°N)P 

= y^NP - im^B^NP 
= y,^NP - ix°NP ii F D G 
= v^P - v°P. 
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Since ^ v monotonically and since /xf -> /x^, we have 
lim 2 Pr«"[2/o e5A---Ay,_i6^A2/, = s] = /^s + (vP)^ - (v°P)s 

= V3 - 

since v(I — P) = ju.. Thus the contribution is 

= lim X {v - v°P)s eS A ■■■ A y„,_ieS A = i]PnPjk- 



m s€E 



But 



If 



and 



Prs[yi eS A ■■■ A y,„.:,eS A y„ = i] = (^P"* ^P)si. 



IT U\ 10 0 

P = , then ^P = 

\R Q \0 Q 



Epm-ip _ 



0 0 
Qm-ip Qm 



From this representation of ^P, we see that the sum 2se^ J^iay now 
be extended over all of S, and the contribution is 



= lim [(. - .^P)(^P-iP)LP,,P,,. 



Now 



vGP{^Pm-^)P < yGpm+l_^^ 

since is a potential. Therefore the contribution is 
= lim (y^P--iP),(P,,P,J. 

m 

Since i is a state of P, this expression is 

= lim (rgQ’"“iP)j(PyP;fc). 

m 

From the identity v = rP + /x, we have 

t'g = vgU + vgQ + l^e 
or 

vgQ = rg - {veU + fig). 

Iterating, we find that 

^gQ^-^ = ^g- + i^g){i +Q+--- + Q^-^) 

and hence 

vgQ”^-^R = vgR - (veU + ^ig){I + Q + ...+ Q’--^)R. 
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By monotone convergence, 

Mmv^Q^-^R = v^R - (v^U + H'sfNgR, 

m 

As we noted in Section 2, the balayage potential satisfies 

V® = (v^ V,U ^Ng). 

Hence 

lim = {v - v^)gR - fig^NgR. 

m 

If we twice use the fact that v and agree on E, we obtain 
(v - v^),R = [{v - v^)P]^ = [(v - - (v^ - 

Thus 

lim vgQ^ ^R = ^R) — 

m 

We conclude that the contribution to Pr[^] is 

= - /xB^),P,,P,,. 

(5) Completion of proof . From steps 3 and 4, we find that 
Pr[a:o = i ^ z-^ = j h Z 2 = k] = for i e E. 

Hence 

Pr[2;2 = k\zQ = i ^ z^ = j] = Pj^ 

if Pr[a:o = ^ A = ^‘] > 0. That is, { 2 :^^} is a Markov chain with 
matrix P and with starting vector provided P = {0, 1, . . . , e}. 
We have seen how the proof for a general finite set follows from this 
special case. 

We must compute the mean number of times in state j in {x^. The 
mean number starting in E is {ix^N)j = v^. Hence the mean total 
number is lim^^; rf = Vj by monotone convergence. 

Finally we show that the contribution to v of the paths with u > — 00 
is pN. On these paths the process behaves as if it were started with 
distribution pB^ (see step 3 if P = {0, 1, . . . , e} and use the identity 
B^B^ = B^ , where E D F, in the general case). Therefore the 
contribution is 

lim {pB^)N = pN - lim p 

E E 

by Lemma 8-17. Since < N and lim = 0, p^N-^0 by 
dominated convergence. Thus the contribution is 

lim (pB^)N = pN. 
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13. Examples 

Example 1: Basic example. 

For the basic example, we recall from Chapter 5 that 






[h 

iSoo 

h 

.^00 



if i < j 



i8i 



if i > j. 



In a process of this kind in which all pairs of states communicate, there 
is no gain in choosing a complicated starting vector tt. Indeed, if we 
choose 7T to be a unit mass at 0, then all finite-valued P-regular func- 
tions are certainly 77 -integrable, and therefore no other choice of tt will 
make the representation theorem yield additional regular functions. 

With 7T chosen as a unit mass at 0, K{i,j) becomes 






0 } 



n 

Pi - 



I Pi 



a i < j 
if i > j. 



Since ^ is of a particularly simple form, it is possible to compute the 
metric for the exit boundary. If j < j', then 



d(j,j') = ^WiNoi\K(i,j) - K(i,j')\ 

i 



- 1 



, 

'P 




M 

Pi) 




= 2 '^i- 

The metric space (S, d ) thus is isometric to the subset of the real line 
consisting of the points 0, w^, -f W 2 , • • •• 
Cauchy completion contains the one extra point corresponding to 

Alternatively we can find directly without using the metric. In 
fact, if is a sequence of points in S, then [K{i, is clearly Cauchy 
if and only if either is eventually constant or [j^ tends to infinity. 

Either way, there is exactly one boundary point, which we may call 
+ 00 , and the relative topology for S is discrete. Moreover, 



K{i, +co) = UmK(i,jn) = 1. 



Since 1 is regular and since there is only one boundary point, the 
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boundary point must be extreme. Moreover, every regular function 
A > 0 must be constant. Since the process does not disappear, 
harmonic measure /x must assign unit mass to +oo. 

Next, we consider the entrance boundary. If / is a unit mass at 0, 
then J is given by 






Nn 

jO 



%l^o 

Pi I ( Po 
fiJ 1^00 



if j = 0 
if j > 0, j > i 

if j > 0, j < i. 



The sequence 1, 2, 3, . . . again is Cauchy, so that there is necessarily 
exactly one limit point in *aS. But 



lim J(n, i) = J(0, i), 

n-» c» 



and consequently *S = S and the entrance boundary is empty. The 
state 0 is a limit point and thus S does not have the discrete topology. 
Since the entrance boundary is empty, there are no non-zero regular 
measures (t > 0. (We had arrived at this conclusion in Chapter 5, 
too.) 



Example 2: The p-q random walk. 

The process P is the Markov chain with state space the integers which 
moves one step to the right with probability p or one step to the left 
with probability q = I — p. We shall assume 0 < q < p < 1. A 
calculation like that in Section 5-8 shows that 



and then 



f 1 if i < i 

a i > j 



f(p - q) ^ if f < i 

\(qlpy~Hp - q)~^ if » ^ i- 



Take tt and /to be unit masses at state 0; as in Example 1, we may 
make these choices since all pairs of states communicate. We obtain 



1 for j > i and j > 0 

{qlpY for j < i and j < 0. 

There are two distinct infinite Cauchy sequences, one corresponding to 
+ 00 and the other corresponding to — oo. For these points, 



N-- 

K{i,j) = » 



N. 



0 ; 



K{i, + 00 ) = 1 
K{i, -CO) = (qlpY- 
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Both functions are regular and minimal, and thus the boundary 
contains two points, both extreme. No point of is a limit point. 
Since the process goes to +oo with probability one, we have 
jji({ 4- oo}) = 1. (The concentration of fx can also be deduced analytically 
from the representation of 1 .) 

The entrance boundary is treated most quickly by reversing the 
process with respect to the regular measure a = 1^. Then P is a 
process of the same type but with the roles of p and q interchanged. 
Consequently 

= 1 

'^(+oo>i) = {plqY, 

and — oo}) = /2({ — oo}) = 1. 

The function p{x), which does not depend upon a, satisfies 

flE J 

p(— oo) = lim lim = lim ~p — ^- 

For p(+oo), we note that if j is large, then hf = where m is the 
last state in E, Also 



^ (qlpy~"' ^ (qip) 
Nio ip - q)~Hqlpy ip - 

As P \ a8, +00 and therefore 



p(+oo) = 0. 

Any extended chain representing starts at — oo with probability 
p — q and goes to +oo. 

The p-q random walk can be used to show what happens if n is 
chosen in such a way that there is a P-regular function which* is not 
TT-integrable. In fact, set 






iplqy{^ ^ P for i < 0 

0 for i > 0. 



Then 77 I = 1 and the regular function {q/pY will not be integrable. 
We must recompute First we note that for j > 0, 

= 2 = ip - = ip - 9 ')"^ 

i<0 

and that for j < 0, 

2 = i’'" 2 ipi^y^ii 

i<0 t<0 

= 2 (pliy + p~'^ 2 iplqyiqlpy~’ 

i<j j<i<0 

= ip - qyHpIqy + p~^jiplqy- 
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Then 



- 1 “ - 

nj 



1 for j > i and J > 0 

(? - D-yp)\ for, < iandjso. 



- g) ^ V 

Again there are two boundary points +oo and — cx), but this time 

K(i, +CX)) = 1 



and 



K{i, — oo) = 0. 



Thus —00 is not an extreme boundary point. The nonintegrability of 
(q/py introduced a boundary point whose associated function was 
identically zero. This example is typical of the general situation, but 
we shall not pursue the details. 

There is still a second way a zero boundary point can arise, and the 
degenerate case where p = I and g = 0 in the example above illustrates 
the point. If h is regular for this process, then 

and hence h must be constant. On the other hand, it is readily seen 
that, for any choice of tt for which ttN > 0, there are two boundary 
points +00 and — oo and that 

K{i, +oo) = 1 

and 

K{i, — oo) = 0. 



Thus again there is a zero boundary point, but this time no function is 
missing. 

Example 3: Symmetric random walk in three dimensions. 

The symmetric random walk in three dimensions is the sums of 
independent random variables process on the integer lattice in three- 
dimensional space with transition probability ^ from any point to each 
of its six neighbors. We recall from Proposition 5-20 that the only 
bounded regular functions for this process are the constants. We now 
prove, using in part the methods of transient boundary theory, the 
deeper result that the only non-negative regular functions are the 
constants. 

In fact, if we choose 77 to be a unit mass at 0, then every regular 
function is 7r-integrable, and the representation theorem assures us that 




10-52 



Examples 



387 



it is enough to prove that K{',x) = 1 for every x in the boundary. 
That is, it suffices to prove that for every i 

lim K{i,j) = 1. 

j-^co 

We shall prove in a moment the following estimate for Nqj-: If j # 0, 
then 

Once we have this estimate, we also have, for i j, 



K(i,j) = ^ ^ 



+ 0{\j - i\ 2) 



lil'^ + 0{\j\-^) 

iil 



li 



+ 0(|j|-i) 



1 + o(|j|-i) 

lil 



li - i 



r\ + 



Hence, limy K(i,j) = 1, as asserted. 

Thus we are to prove the estimate for Nqj. We first show that 



(*1I2 /*l/2 /*l/2 

^0, = 3 

J -1/2 J -1/2 J -1/2 



e ^^^^^‘^'^dxidx 2 dx 2 



1/2 ^ — cos 2ttX-^ — cos 277^2 “ cos 277^:3 

where x = (x^, X 2 , x^). Call the right side of the equation hj. The 
singularity in the denominator of the integrand is of the order of 



1 1 

3 - (1 - 2 tt ^ x ^^) - (1 - 2772^:32) - (1 - 2772^:32) “ 2^^2|^ 

near x = 0, and \x\~^ is integrable in a neighborhood of x = 0. Thus 
hj is certainly well defined, and moreover hj -^0 as J 00 by the 
Riemann-Lebesgue Lemma. We claim that 

(/ - P)^K = S^o- 

Call the six neighbors of 0 by the names and call the region 

-i < ^ 1 , X 2 , < \ 
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by the name E, Then 

(/ - P)„A = + 



= 3 (3 — cos 2ttx-^ — cos 2ttX2 — cos 2ttx^) ^ 

JE 

= 3 (3 — cos 2ttx-^ — cos 277X2 — 277X3)“^ 

JE 

=J, 



e-^^im’^dx 



— ^mO* 



Now Nq. = N.q is also a function tending to zero (see Proposition 7-10) 
and satisfying 

~ — ^m0> 

and thus N.q — h. is a bounded regular function. By Proposition 
5-20 it must be constant, and that constant must be zero, since both 
functions vanish at infinity. Thus the formula for Nq^ is established. 

We now have Woy exhibited as the jih Fourier coefficient of a certain 
periodic function of three variables. The remainder of the proof will 
presuppose some knowledge of Fourier transforms and tempered 
distributions. 

Let 99 (x) be an infinitely differentiable function defined on such 
that 



(1) 0 < 9 < 1. 

(2) 99(x) = 1 for X in a small neighborhood of 0. 

(3) 99(x) = 0 outside of a slightly larger neighborhood of 0. 

For simplicity, let ||x||^ = x^^ + X2^ + X3^. Denote by/(x) the periodic 
continuation of the function 



The difference 



f(x) = q>{x) 



3 

2t7^|x|^ 



+ 



w 

2|x 



14 



fix) - 



3 

3 — cos 2 t 7 Xi — cos 2 t 7X2 ~ cos 277X3 



is a bounded periodic function which is infinitely differentiable away 
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from 0 and is 0(|x|2) near 0. Moreover, its Laplacian is integrable in 
E — {0}. These facts imply that the Fourier coefficients of the 
difference are 0{\j\ ~^) as j > oo. Hence it is enough to show that the 
Fourier coefficients of/ are equal to 

It is clear from the definitions that the Fourier coefficients of / 
satisfy 

I = f f{x)e~^^^^'^dx, 

J E J 



Here the right side is the Fourier transform of / evaluated at j. We are 
thus to estimate the Fourier transform off. Write 



f(x) = [cp(x) - 





3 IWl^ - fkh 3 
2tt^\x\^ 21x1* ^ lO’ 



and take the Fourier transform of both sides in the sense of distributions. 
We shall consider the four terms on the right separately. 

The transform of the constant is the distribution defined by the 
measure which assigns weight to the origin. It has the property 
that it is supported, say, entirely in the unit ball. 

In the third term, the numerator is a homogeneous harmonic 
polynomial, and the transform of the term is known from Fourier 
analysis to be 

2i»r' J’ 

where 

(PV[gr], 4i) = lim I* g{x)ip{x)dx. 

J \x\^€ 

This distribution has the property that it is the sum of a function and 
a distribution which is supported in the unit ball. The function is 



isMLzikl! 

877 21 ^ 1 ’ 



for \y\ > 1, 



and it is 0{\y \ as 2 / oo. 

The transform of the second term is known from Fourier analysis 
to be the distribution arising from the function 



3 
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We come to the first term. Since the transform of the left side is a 
function, the transform of the first term must be the sum of a function 
and a distribution supported in the unit ball. On the other hand, the 
first term is an infinitely differentiable function. If we iterate taking 
the Laplacian of it enough times, the resulting function is integrable 
and has a bounded function as Fourier transform. Since the opera- 
tion of taking the Laplacian goes into multiplication by — 
under the Fourier transform, we obtain the inequality of functions 

(^TT^\y\^Y FT(first term) < k sufficiently large. 

This result implies that the function part of the distribution 
FT(first term) is 0(\y\ as i/ — > oo. 

Since the sum of the transforms of the four terms is a function, we 
must have, outside the unit ball, 

mf)iy) = 0{\y\-^’^) + ^ + 0{\y\-^). 

We conclude therefore that 

and hence that 

as required. 

Example 4: 

This example will be a process with an exit boundary point x such 
that • , a;) is regular and normalized but is not minimal. 

The state space will be 

S = {Oi, a'i, I i = 0, 1, 2, . . .} 

and the non-zero transition probabilities are 



_ 1 a( ~ Pii 


= 1 


- Pi 




t 

II 




II 


^a'i _ 1 _ 1 Qi 1 


- Pi- 



The picture for this process is as follows. 




10-52 



Examples 



391 




Let 

^0 = L A = fl ^=0 = lim A> 

k = l 

i 

yo = L yi = n 

fc = l 
i 

a_i = 0, (^i = 2 Mfe + i/y*:. <^00 - lim 

fc = 0 

We shall assume for this example that a^o = +oo. At the end of the 
discussion of the example we shall show that this condition is possible. 
Any state can be reached either from Uq or from Uq. Thus if we set 

'^tto “ '^a'o ~ 

then ttN is strictly positive. Since the process is in any given state at 



most once, we must have H — N. 


It is clear that 




ro 


if j < i 


TJ TJ 


Wi 


if j > i. 




ro 


if j < i 


= 


\vilYi 


if j > i, 


and = 0 = = 




We must compute 



(and JTai'by)- Ifj < h then = 0. Otherwise we obtain a set of 

alternatives by considering the first time the process switches from 
the a-row to the 6-row. Then 



^aibj 2 ^(iia,^Qk + l^bicbj* 



Substituting, we find 



} 

1 

k = i 

(0 



^aibf 



li 

liS, 



if j < { 

((T, - aj_i) 
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In determining the exit boundary we use the observation that when- 
ever finitely many Cauchy sequences exhaust S, then those Cauchy 
sequences define all of the boundary points. In the present example 
the claim is that the three sequences {a^}, {aj}, and are each Cauchy 
sequences. 

First consider K( • , a^) for large j. We have 
K{a[, aj) = K(bi, a^) = 0 

and, for i < j, 

V(n n \ — 

^j) — Tj ~~ 1 zi o 

^Jiaj 2-^aoaj Pi 

Hence = lim Uy is a boundary point, and it satisfies 

2 

K(ai, ttoo) = w and K(a[, a^) = K{b^, a^) = 0. 

Pi 

We can check directly that • , a^o) is a regular function; moreover, it 
is normalized because 

7rK(^,a^) = \K{ao,a^) = = 1 . 

Similarly we find that {a'} is a Cauchy sequence. If we call its limit 
a'oc, then 



K{a[, a'^) = and Kia^, a'^) = K(b^, a'^) 
Pi 



0. 



Since K(-,a^) / K{-,a'^), and a'^ are distinct boundary points. 
We see also that K{ -, a'^) is regular and normalized. 

Finally we consider the sequence {bj] for large j. For i < j, we have 



and 



K(ai, bj) = 

K(b,^ 6y) = 



bj 


^aibj _ 


1 /or, - aj 




^aobj 


Pi \ 




_ 


1 






yiO-; 


— 4" QO 5 


we have 





and 



lim K(ai, bj) = lim K(a[, bj) = ^ 
j j Pi 

lim K{bi, bj) = 0. 



Therefore {bj} is Cauchy. Denoting its limit in S* by b^, we conclude 
K{ai, b^) = K{al b^) = 1/^^ and K(bi, b^) = 0. 

We can check that K{- ,b^) h regular and normalized. 
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Thus there are exactly three boundary points, and each is associated 
with a regular normalized function. But 

and 600 therefore cannot be an extreme point. Then by the repre- 
sentation theorem all non-negative regular functions are generated by 
K{- ,a^) and , a Hence their linear independence as functions 
on S implies that they are both minimal. That is, a'^}. 

In this example, is positive if and only if harmonic measure assigns 
positive weight to the boundary. In fact. 

The remainder of the weight is put on the states from which the process 
can disappear. We obtain 

= Pr^[process disappears from bj] = (yy — yy + i)cry. 

Note that if = 0, then there exist non-negative regular functions, 
but none of them is bounded. 

We still must show that the condition o-oo = +00 puts no restriction 
on whether must be positive. To get ^^0 = take 



p. = q. = = I 

Then j 3 y = yy = 2~^ and ay = ^(j + 1). Hence = +00. To get 
j8oo > 0, take 



_ jjj + 2) 
{j + 1)" 



= 1 - Pi = 77 



O' + 1 )=^ 



j + 1 

r . =z 

^ i + 2 



Then 



and 



l 8 y = 



i + 2 
20 + 1 )’ 



^CO - 2’ 



yi = j 



j + 2 



= V i + ^ 1 j + 2 ^ ^ 1 

, 420 + 1)0 + 2)2 2 , 4 , 40 + 1 ) 



= +00. 



Example 5: 

This example will be a process in which has every point of aS as a 
limit point. 

Let S be the set of all finite strings of positive integers of the form 

( 1 , ki, ^2? • • • J ^n) 

as n and the k's vary. The transition probability from (I, k^, . . . , kj^) 
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to (1, ii, . . i;„, m) is to be and all other transition probabilities 



are zero. 

Since each state is entered at most once, we have N = H, The 
hitting probability from 



to a distinct state 



IS 






ij 



0 



^ ~~ (1> Jl> • • • > Jn) 

J = (1, fci, . . m) 

+ •• +fc„, + m) n < n' and = k^, , , = k^ 

otherwise. 



Take tt to be a unit mass at (1). If 

0 (l)j “ (1> • • • > ^n)j • • • > ^)> 

the claim is that 

lim K(I,JJ = K(I,J) 

m-> 00 

for all I, It would therefore follow that every point J in aS is a limit 
point of S*. 

For the proof we observe first that K(I, J) = K(I, = 0 unless I 
is an initial segment of some Also if / = then 

K{I, J) = lim K{I, J^) — 0. 

Thus we may suppose that I is an initial segment of J. In this case, if 
I = (1, . . . , A^s), 

K{I,JJ = ^ = + - + 

and 

K{I,J) = ^ = 2'^! + + 



Hence K{I, J) = lim K(I, always. 



Example 6: Space-time coin tossing. 

Let S be the set of lattice points (n, i) in the plane with 0 < i < n. 
A point of S is to be identified with i heads in n tosses of a fair coin. 
Thus we take 

p — p —1 

with all other transition probabilities equal to zero. In this example, 
N = H and also 

( In — m\ 

2^-^/ I if n > m, j > i 

\j ~ i I 



lo 



otherwise. 
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Let 7 T be a unit mass at (0, 0). Then ttN > 0 since every state can 
be reached from (0, 0). We obtain 



K{(m, i), (n,j)) 



'irri/Q 

0 otherwise. 



As a first step in obtaining the exit boundary, we shall prove that if 
is Cauchy, then lim;^ ijkl'^k) exists. In fact, set i = 0 and 
m = 1 and consider, for > I, the expression 



K{(1, 0), K, jfc)) = 2| 



Wfc - 

. jk 



I 



jkt 



- ifc) ^ 2 - 2 



As k-> CO, the left side converges; hence so does the right side, and 
limjj^ln}^ must exist. 

Conversely suppose that is an infinite sequence such that 

t = \im exists. We claim that is Cauchy. Fix {m,i) 

and denote by {n,j) a term of the sequence Then 

K({m, i), (n,j)) 



= 2 ” 



n — m 



2 m [Q')- • -(i - ^ + 1)][(^ - j). . .(n - j - m + i + 1)] 
(n). . .(n — m + 1) 



Noting that both numerator and denominator have exactly m factors 
and that m is fixed, divide the numerator and the denominator by 
and pass to the limit in each factor separately. In each factor we have 
jin and, for instance, (i — l)/n — > 0. We get 



lim A((m, i), (n,j)) = — t)^ \ 



Therefore the classes of infinite Cauchy sequences are in one-to-one 
correspondence with the rays to infinity, that is, in one-to-one corre- 
spondence with points t in [0, 1]. The functions associated to these 
sequences are 

K((m, i), t) = - t)^-\ 



It is easy to check that all of these functions are regular and normalized, 
and hence the points in question form the entire boundary B (and 
nothing more). 

Next, we check that the topology of B is the usual topology for the 
unit interval. The map of the unit interval onto B is continuous 
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since each function K{(m,i), •) is continuous in the unit interval 
topology. Since the map is one-one from a compact space onto a 
HausdorfF space, it is a homeomorphism. 

To see that every point of B is extreme, we assume, on the contrary, 
that 2‘^t\^{\ — is not minimal. By Theorem 10-41 there is a 

measure v on the Borel sets of C [0, 1] with 

2^(1 - = f - t)^~^dv{t). 

J Be 

Specializing to the case m — i and extending v to be defined on [0, 1], 
we find that 




for all i. But by the Weierstrass Approximation Theorem, v is 
completely determined by its integral against all polynomials t^. 
Hence v is a point mass at tQ, and t^ is extreme. 

We can summarize our results so far by saying that no point of is a 
limit point, that the boundary B is homeomorphic to the unit interval 
under the correspondence 

A((m, i), t) = - t)^-\ 

and that every point of the boundary is extreme. 

The Strong Law of Large Numbers, when applied to coin tossing, is 
equivalent with the statement that harmonic measure fx concentrates 
all its mass s,t t = \, We can verify this result directly: We do have 

= 1 , 

and, by the uniqueness half of Theorem 10-41 J ^1/2 can be the only 
measure yielding 1. Hence 81/2 is harmonic measure; that is, harmonic 
measure concentrates all its mass at ^ 

By Theorem 10-41 every measure on the unit interval gives rise 
to a non-negative regular function h for the process and fx^ is harmonic 
measure for the ^-process. We shall discuss only two special cases. 

If fx^ is the unit mass at a point (q other than 0 or 1, then 

= 2%{l - tor-' 

and 

^(m.0.(m + l.i + 1) = ^0- 

Thus the /i^-process is space-time coin tossing in which there is prob- 
ability tQ of heads and probability I — Iq of tails. The fact that 
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= 1 means that with probability one the ratio of heads to 
total tosses tends to Iq, again in agreement with the Strong Law of 
Large Numbers. 

If, instead, is Lebesgue measure, then 




= gm - ^ + 1) 

r((m + 1) + 1) 

= O’" 

(m + 1)! 



Consequently, the transition probabilities in the A-process are again 
computable, and we find 



and 



■^(m,t),(m + l.i) 



m — i + 1 
m + 2 



T>h 



i + 1 

m + 2 



The significance of this process is discussed in Problems 28 and 29. 



14. Problems 

1. If 7 T assigns positive weight only to finitely many states, show that 
7tK( x) = I for every boundary point x. What is the corresponding 
condition so that K(- , x) is regular for every boundary point ? 

2. Consider the identity 

f PK( . , x)dfji(x) = P f K(‘, x)dfji(x) = P1 . 

Js* Js* 

What can we conclude about /x if P1 = 1 ? What if (P1)i < 1 ? 

Problems 3 to 12 deal with sums of independent random variables on the 
integers with p_i = f, Pi = 9 , P 2 = I- This example was discussed in 
Section 5-8, and the results there obtained will be useful. 

3. Find N. 

4. Let 7 Tq = 1, and compute K(i,j). 

5. Show that the two boundary points are ± 00, and find the two minimal 
functions. 

6. From the form of these two functions determine what weight /u, must 
put at each point. Does this agree with your understanding of the 
'ong-range behavior of the chain ? 

7. Let hi = Find P^ and Check the representation of h. 
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8. Show that the h of Problem 7 is minimal by proving that has only 
constants for bounded regular functions. 

9. Let/i = 8io and g — Nf. Find J(i, j). 

10. Show that the entrance boundary is “the same” as the exit boundary, 
and find the two minimal measures. 

11. Find jj(~oo) andp(-f oo). 

12. Construct extended chains representing the two minimal measures of 
Problem 10 (in the sense of Theorem 10-9). For each chain, verify that 

Pr[ 2 „ e CT] = J p(x)d/ji.'’{x). 

Problems 13 to 20 refer to a “double basic example.” Let 

>S = {0, 1,2,...,0M',2',...}. . 

The non- zero entries of P are 



— Pii ^i-1,0' — ^ ~ Pi\ P (i-l)'.i' “ Pi'i Pii-iy,0 ~ Pv- 
Let 

i i 

Po = Po' = 1. ft = rii’i’ ft' = n Pi’ 

k=l k=l 

= hm 

Assume that j8oo > 0 and j8oo' > 0. 

13. Find H. [Hint: Use Propositions 4-14 and 4-16.] 

14. Let 7Tq = I, and compute K. 

15. Show that there are two boundary points which correspond to oo and 
00 ', and find the minimal functions. 

16. From the forms of the two minimal functions, find /x. Give an intuitive 
interpretation for the two weights. 

17. Find the most general non-negative normalized regular function h. 
Show that A is a convex combination of the two minimal functions. 

18. Show that any such h tends to at each boundary point. 

19. Let h = \K{ • , oo) + f A( • , — oo). Show that P^ goes to oo with prob- 
ability ^ by computing /j^{oo). 

20. For the example of Problem 19, compute P^ explicitly if 



_ i(i + 2) 

“ (Tnr 



and = 



(i -f l)(i + 3) 
(i + 2)2 



Verify the conclusion of Problem 19 by computing for 



E — {iy i 4" 1) 2, . . .} 

and by letting i tend to oo. 

Problems 21 to 27 deal with a certain tree-process. The states are all finite 
sequences of H and T. The empty sequence is the starting state and is 
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denoted state 0. From state a = («!, ag? • • • » process is equally 

likely to go to (aj, • • • » (®i» « 2 j • • • ? ^)- 



21 . 

22 . 

23. 

24. 

25. 



26. 



Find N, 

Let TTo = 1, and compute K. 

Show that the boundary is the set of all infinite sequences of H and T. 
What is the topology of the boundary ? [It is not that of the unit interval.] 
Use the identity 

Pra[a;v eC] = j K(a, x)dfi{x) 
to find the measure fi. 

Let h be defined by 



K = 



n 

2 

4 

10 



if a = 0 
if a = (H) 

if a begins with = a 2 = H 
otherwise. 



Prove that ^ is a normalized regular function. Find and jjl^. Show 
that h is continuous and that h{x) = dfjJ^ldfjL for a.e. x. 

27. Let fa — 1/m if a consists of exactly m ^’s and m T’s, and let fa equal 0 
otherwise. Show that / is a charge, and verify that 

Pr[lim g{xj = 0] = 1. 

Problems 28 and 29 deal with an instance of Polya’s urn scheme. An urn 
contains some white balls and some black balls. A drawing is made with 
each ball equally likely to be drawn; the ball drawn is then replaced and 
another of the same color is added to the urn. This scheme is repeated over 
and over. 



28. Let the pair (m, i) stand for m + 2 balls in the urn with ^ -f 1 of them 
white. Show that if the outcomes of the Polya urn scheme are taken as 
such pairs (m, i), then the resulting process is a Markov chain. Note 
that the transition matrix for this chain is identical with the one for the 
^-process considered at the very end of Section 13. 

29. Let the scheme be started with 1 white ball and 1 black ball, and let 
be the fraction of white balls at time m. Use the observation in Problem 
28 to compute 

lim sup Pr[|r„ - < e]. 

m 

Problems 30 to 34 establish the necessity of a necessary and sufficient 
condition that a transient chain P with all pairs of states communicating 
have a non-zero non-negative regular measure a. Number the states 
0, 1,2,.... Let ^Lij- be the probability that the process started at i reaches 
j and that the first vi^t is immediately preceded by a visit to a state > k. 
For instance, = Hij. The condition on P if a exists is as follows: 
There must exist an infinite set E of states such that 

k T 

lim lim = 0 for all j. 

k-* 00 i-t-ao jJ-ii 
ieE 
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30. Prove that 



H,, - N^o nv 



for any n with P{,"* > 0. 

31. Let 

= Pr,[a;„ = k A x„ ^ j if 0 < m < n]. 

Show that 

=11 

r = k n = 0 

and derive as a consequence that 

< Nij- — 2 ^ir^rj' 



32. 

33. 



Define / by /y = Syo- If an a > 0 exists with aP = a, show that there is 
a point Xq of for which */(a:o, • ) is a minimal measure. Choose E to 
be the set of states in some Cauchy sequence converging to Xq. 



Prove that 



lim lim 

te-*. 00 i-* 00 
i€E 




= 0 . 



34. Put together the preceding results to obtain a proof of the necessity of 
the stated condition. 



Problems 35 to 38 refer to a transient chain with absorbing states. As 
usual, we write the transition matrix in the form 



35. Put 




Show that P' has only transient states. Let tt be any starting distribu- 
tion for P' with ttN' > 0. Prove that if i is an absorbing state of P, 
then the harmonic measure /x(i) in P' is equal to the probability in P 
of absorption in i. 

36. If P is the infinite drunkard’s walk with p = |, find harmonic measure 
for P' when = 1. 

37. Suppose P1 =1. Find a necessary and sufficient condition on harmonic 
measure for P' that P should have been an absorbing chain. 

38. Let Q be any chain with all states transient and with the enlarged chain 
Q absorbing. Use the condition of Problem 37 to give a new proof that 
Q has no non-zero bounded regular functions h > 0. 
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1. Entrance boundary for recurrent chains 

Boundary theory for recurrent chains proceeds along altogether 
different lines from the approach in Chapter 10. A clue to the 
difficulty is that every non-negative superregular function is constant, 
and hence the representation of such functions degenerates. More- 
over, since a recurrent chain is in every state infinitely often with 
probability one, an almost -everywhere convergence theorem is out of 
the question. 

But intuitively, at least for some recurrent chains, there is some 
limiting behavior going on. For instance, with the one-dimensional 
symmetric random walk the Central Limit Theorem implies that the 
probability that the process is on either half-axis after time n tends to 
^ as 71 increases. 

If P is a normal chain and P is a finite set, then (P^P^)jy is the 
probability that the process started in % enters E after time n at state 
j, and its limit Af is the ‘Tong run” probability of entering E at j. 
As we let E swell to the whole state space S, it is not clear from this 
interpretation just what happens. Consider therefore the following 
alternate interpretation: is the probability that the process 

started in i at time — n enters E after time 0 at state j, and the limit 
Af is the probability that the process started at time — oo enters E 
after time 0 at state J. In this interpretation, if we let E swell to S and 
pass to the limit in the appropriate sense, we can expect A^ to converge 
to an entrance distribution for the chain. 

Suppose we compute instead the limit of on E followed by the 

limit on n. By dominated convergence, we have lim^^s P^B^ = P^. 
Therefore, if we can justify the interchange of limits on E and n, we 
see that each row of P^ can be expected to converge to an entrance 
distribution for the chain. This conjecture is in contrast with the 
situation for transient chains, where P^ converges to an exit distribution. 
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In other words, the limiting behavior of P” has something to do with 
the past history of the process. Our procedure will therefore be as 
follows: We start with a recurrent chain P, make it disappear after it 
reaches 0, and apply transient entrance boundary theory to the result- 
ing process. The boundary obtained will be the entrance boundary 
for P, and it will be suitable in a wide class of chains for describing the 
limiting behavior of both and P^. We shall see that this procedure 
is canonical in that it does not depend upon the choice of the state 0. 

For the remainder of this chapter, let P be a recmrent chain, let 0 
be a distinguished state, and let a be a finite- valued positive regular 
measure; a is unique up to a constant factor. 

We define a transient chain Q associated to P and 0 by 

(P^j if i # 0 
“ \o if i - 0. 

For this transient chain N^j = -f Syo- Choose / to be a column 
vector which places a unit weight at 0 . Then Nf = 1 . 

Form the Martin entrance boundary of Q with respect to the reference 
function /. We have 

+ ^^ 0 - 

The compact metric space is the completion of S in the metric 
described in Chapter 10, and we shall see in Proposition 11-1 that 
does not depend upon the choice of the state 0. The spaces and 
*S — S can unambiguously be called the completed space and the 
recurrent entrance boundary, respectively, of the chain P. The set 
of extreme points of *S — S will also be independent of state 0, and 
we are therefore free to speak of extreme points of the recurrent 
boundary. 

Since J{x, j) is well defined for x in the above expression for J 
shows that ^N{x,j) is also well defined if we put 

if X = ieS 

if X e — S and x = lim 



Proposition 11-1 ; If *Sq and are the completed spaces of P formed 
with respect to two distinct states 0 and 1, then the identity map on S 
extends to a homeomorphism of onto *8^^. Under the homeo- 
morphism the extreme parts of the boundary correspond. 
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Proof: We first establish the identity 

= + Wo, (*) 

«i 

In fact, the mean number of times the process started at i visits j from 
the time of the first visit to 1 until before the first visit to 0 is 
Thus if we compute in two ways the mean number of times the process 
started at i visits j before reaching both 0 and 1 , we obtain 

Wo,. 

Substitute = 1 — and get 

w„ =. + Wo, - W,i(Wo, + w„). 

Applying Lemma 9-9 to P under the identification i ^ 1, J -> 0, and 
k we find that the term in parentheses equals (a,/ai) Wn- From 
this relation, (*) follows. 

Equation (*) and the expression for the kernel J show immediately 
that Cauchy sequences of S in the chain relative to state 0 are Cauchy 
relative to state 1, and by symmetry the converse is also true. Thus 
the statement about *Sq and follows. 

For the assertion about the extreme points, let Jq and be the 
kernels for the two transient chains, and suppose, for instance, that 
^i(y^ •) is normalized minimal and that Jo(y, •) is not. In any case, 
Jo(y, •) is normalized since Jo{y, 0) = 1. Thus 

Jo{y^ ') = -)dv{x) 

J*s 

by Theorem 10-48, where v(*S) = 1 and v does not concentrate all its 
mass at one point. Extending equation (*) to *8 and using the 
connection between the kernels and the N's, we have also 

- — Jq(x, 1) - S,o. 

Integration of this equation against v gives 

JiiyJ) = I Ji{x,j)dv(x). 

J*s 

Since </i(y, •) is normalized minimal and concentrates its weight at 
more than one point, this last equation contradicts the dual of Lemma 
10-33. 
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2. Measures on the entrance boundary 

Let 8 q 8.0 denote the 0th row and column, respectively, of the 
identity matrix. In this section we consider finite- valued row vectors 
V for the Markov chain P with the following three properties: 

(1) V > 0. 

(2) Vo = 0. 

(3) vP < V 8o.. 

We shall associate to each such row vector a probability measure jS'' on 
the entrance boundary and show how jS'' is related to v probabilistically. 

Direct calculation shows that the row vector whose Jth entry is 
V. = V- + Sq^. is non-negative and Q-superregular. We introduce a 
process which, when v > 0, is the v-dual of Q. Let = {i | > 0} 

and define 

= for 

All other entries of are taken to be 0. Note that 

A- = + So,. 

^0 



Proposition 11-2: If v satisfies (1), (2), and (3), then there is a unique 
probability measure on the Borel sets oi 8 KJ such that 



V = f ^N{x, -)dp^(x). 

JsuB® 



Proof: By Theorem 10-48 

-I 



J(Xy ‘)dp^{x) 



SuB^ 



for a unique measure jS^. Since vq = 1, v is normalized and ^^(8 U B^) 
= 1 . Hence 

V = r ^N{x, -)d^^(x). 

JsuB® 

If another such probability measure is given, we can reverse the 
argument and conclude the measure is by the uniqueness half of 
Theorem 10-48. 



We define Of to be the probability in the Q^'-process started at 0 
that there is a last time the process is in E and that this occurrence is 
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at state j. Note that 6^ depends on P, 0, v, and E and that Of = 0 
for j not in E, 

Proposition 11-3: If P is any subset of S containing 0 and if v satisfies 
(1), (2), and (3), then 

ve{1 - P^) = - So- 

Proof: Using as a set of alternatives the time when the ^''-process is 
last in E, we have for, j e E, 

C30 

Of = 2 Pr}[process leaves E immediately and never returns] 

n = 0 

L keE \ c,deS / ] 

Now iV^o; = + ^ojy therefore 

^0jQ)k = ^kP kj 

and 

^0jQ)c ^^cdQdk = ^k^kd ^^dc^cj- 

for all states, not just those in U {0}. Substitution and use of 
Lemma 6-6 give 

df = Vj + Soy - 2 ^kPkj- 

keE 

In general, 6^ has total measure less than one, since the process 
either may fail to reach E or may return to it infinitely often. If E is 
finite and 0 e P, neither of these alternatives has positive probability 
and thus = 1. This conclusion also follows from Proposition 11-3, 
since finite matrices associate. 

The special case E = S yields the following corollary. Let stand 
for the restriction to S of the measure jS'’ defined in Proposition 11-2. 

Corollary 11-4: If v satisfies (1), (2), and (3), then 

v{I - P) = 



Proof: Let E = S in Proposition 11-3. The only way the process 
can leave S is to disappear, and thus Of is the probability that the 
Q'’ -process disappears from state j. But this is just the definition of 
since jS'’ was defined as harmonic measure for the -process 
started at 0. 

We conclude this section by noting the connection between 6^ and 
jS''. The proof is contained in the proof of Proposition 10-21. 
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Proposition 11-5: If v satisfies (1), (2), and (3) and if is an 
increasing sequence of finite sets of states with union 8, then the 
measures converge to jS'' weak-star on *5^. 

3. Harmonic measure for normal chains 

We come now to the first convergence theorem. We shall prove 
that if P is a normal chain, then the measures A® converge weak-star 
to a measure j3 which will play the role of an entrance distribution for 
P. This result agrees with the statement in Section 1. 

Lemma 11-6: If P is normal, then the row vector v = Gq. satisfies 
conditions (1), (2), and (3). Also, for any finite set E containing 0, 

- P^) = A| - 8o.. 

Proof: We know that (?oy ^ ^ Gqq = 0. Hence (1) and (2) 
hold. Condition (3) follows by multiplying the definition of Gq. 
through on the right by P and applying Patou’s Theorem. 

Form the if -matrix of Definition 9-80 with respect to the distin- 
guished state 0. By Lemma 9-81, Kqj = Gqj. Hence the formula 
for ve(I — P^) is the 0th row of the formula of Corollary 9-86. 

Harmonic measure jS for a normal chain P is defined to be the 
measure j8 = jS'' of Proposition 11-2 for v = Gq.. The justification for 
a name independent of 0 is contained in the following theorem. 

Theorem 11-7: If {Ej^} is an increasing sequence of finite sets with 
union S, then the measures X^k converge weak-star to j8 on The 
measure p is independent of the distinguished state 0. Also 

Gii = (* ^N(x,j)d^{x) 

and 

G{I - P) = 1jg - 

Proof: Ultimately the sets E ^^ contain 0, and Proposition 11-3 and 
Lemma 11-6 apply. From these results we obtain X^k = and from 
Proposition 11-5 we conclude that converges to /S. Thus p is given 
a characterization independent of the state 0 (since A^ does not depend 
on 0). Since j8 does not depend on 0, we can use any state i as dis- 
tinguished state in Proposition 11-2 and Corollary 11-4. The two 
formulas of the theorem then follow. 
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4. Continuous and T-continuous functions 

The first convergence theorem, which was proved in Section 3, 
resulted from examining a particular row vector v satisfying conditions 
(1), (2), and (3) of Section 2, namely v = Gq., We now begin to work 
toward the second convergence theorem, and we do so by examining 
the vector v = ^N{x, •). For the present we do not assume that P is 
normal. 

We start by checking that v = ^N{x, • ) does satisfy the three condi- 
tions and by identifying 9^ for v. In fact, v > 0 and i/q = 0 because 
^N{i, •) > 0 and ^N(i, 0) = 0. Thus (1) and (2) hold. Moreover we 
have, for every i # 0, 

^N(i, -)P < ^N(i, •) + So- 

Thus we can let i —> x, using Fatou’s Theorem, and obtain (3). 

The procedure for calculating 6^ is to do so for the approximating 
vectors ^N(i, •) first. 

Lemma 11-8: If ^ is a finite set containing 0 and J, then 
2 - P%,. = Bf, - So,. 

keE 



Proof: Apply Lemma 9-37 and conclusion (1) of Theorem 9-15. 

Remark: The lemma remains valid for infinite sets E containing 0 
and j, and the proof consists in computing the dual of the left side by 
means of Propositions 6-16 and 6-17 and Lemma 9-14. 

We shall say that a column vector h is continuous if it has a (neces- 
sarily unique) extension to a continuous function h{x) defined on all 
of 

In this notation the right side of the identity in Lemma 11-8 is 
continuous for fixed j e E, and hence is continuous if j e E. But 
is identically zero for j ^ E and any state can be taken as the 
reference state 0. Thus we are justified in writing B^{x,j) for the 
continuous extension of the jth column of E whenever is a non- 
empty finite set. By an elementary continuous function is meant a 
finite linear combination of such functions. 

Passing to the limit i^ x in Lemma 11-8, we have 

2 ^N{x, k)(I - = B^(x,j) - So, 

keE 

whenever 0 e E, j e E, and E is finite. Consequently 9^ = B^{x, • ) 
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by Proposition 11-3, provided E is a, finite set containing 0. (Note 
both sides are zero for states J not in E.) 

Let us denote the measure j3'' obtained from Proposition 11-2 for 

V = •) by j8^. If {Ej^} is an increasing sequence of finite sets 

with union S, then Proposition 11-5 asserts that •) converges 

weak-star to j3^. We have thus proved the following proposition. 

Proposition 11-9: If A is a continuous column vector and if {Ej^ is an 
increasing sequence of finite sets with union S, then 

-)A = f h(y)dp^(y) 
k J*s 

pointwise for x in *S. 

If V = •), then the associated Q-superregular measure is 

V = J(x, •), Since j8^ is the measure for J(x, •), concentrates its 
mass a>t X ii X is in S or in jB®. Thus the right side of the identity of 
Proposition 11-9 equals h[x) for all such x. 

Define a linear transformation T of continuous functions to bounded 
functions on *8 by 

Th{x) = f h(y)d^=‘{y). 

J*s 

A continuous function h such that Th = A is said to be T-continuous. 
(The motivation for this name appears as Problem 2 at the end of the 
chapter.) 

We conclude this section by characterizing the T-continuous func- 
tions. Notice that if every boundary point is extreme, then every 
continuous function is T-continuous. 

Lemma 11-10: If h is an elementary continuous function, then 
Th = h. 

Proof: It suffices to consider the function where E is s, finite set. 
If E C Ej^, then B^^B^ = B^ by Proposition 5-8. Passing to the 
limit 8is i X with k fixed, we have 

B^k(x, ■)B?j = B^(x,j). 

Hence the left side of the identity in Proposition 11-9 is B^{x,j). But 
the right side is TB^(x,j). 

Lemma 11-11 : If ^ is T-continuous and {Ej^] is an increasing sequence 
of finite sets with union 8, then for any e > 0 some convex combination 
of the functions B^k(x, • )h is within e of h uniformly for x in *8. 
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Proof: By Proposition 11-9, the functions B^k(x, -)h converge 
pointwise to h(x). Thus by dominated convergence their integrals 
against any Borel measure on *S converge to the integral of h. That 
is, the functions converge to h weakly. Thus h is in the weak closure 
of the set (x, • )hj, is certainly in the weak closure of the convex hull 
of the set, and must therefore be in the strong closure of the convex 
hull of the set. (See Dunford and Schwartz [1958], p. 422, Corollary 
14.) 

Proposition 11-12: The set of T-continuous functions is exactly the 
uniform closure of the set of elementary continuous functions. 

Proof: Every T-continuous function is contained in the uniform 
closure according to Lemma 11-11, since B^k[x,j) vanishes unless j is 
in E^. Conversely, every elementary continuous function is T- 
continuous by Lemma 11-10, and the uniform limit of T-continuous 
functions is T-continuous, since T has norm no greater than one. 

5. Normal chains and convergence to the boundary 

As an application of the machinery of Section 4, we can prove the 
second convergence theorem — that each row of P” converges weak-star 
to the harmonic measure jS in a suitable class of normal chains. This 
result was suggested in the discussion in Section 1, and it was pointed 
out that the key to the proof should be a certain interchange of limits. 
In fact, this interchange has already taken place and is concealed in 
the proof of Lemma 11-11. 

We begin with a particularly sharp form of the convergence theorem. 

Theorem 11-13: If P is normal, if i is any state in S, and if h is 
T-continuous, then 

lim {P^h)i = f h{x)d^{x). 
n J*s 

Conversely, if this equation holds for all states i and all T-continuous 
h, then P is normal. 

Proof: Let € > 0 be given and let be an increasing sequence of 
finite sets with union S, Since h is continuous, we can choose large 
enough so that 

J h(x)d^{x) — X^kh < € 

for all k > kQhj Theorem 11-7. Truncate the sequence of sets so that 
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it contains only those sets {E^ with k > Since h is T-continuous, 
we can apply Lemma 11-11 to the truncated sequence to obtain a 
convex combination of the functions B^k{x, -)h which is uniformly 
within € of h. If, say, 

- A|| < €, 

then also 

\Pl{2c,B^'h - A)| < e 
since = 1 and > 0. Consequently, 

j h{x)dp(x) — < IJ A(a:)dj8(x) — ^CjX^ih 

+ - P?.P^0A| 

+ \Pl{2cjB^^h ^ h)\ 

< 2e + — Pf.P^O^I* 

The sum on the right is a finite sum and in each summand only finitely 
many entries of X^j — P^,B^j can be non-zero. Since Pf.B^j 
pointwise, we conclude that the sum on the right side is less than e for 
n sufficiently large. 

The converse follows by applying the assumption to columns of B^ 
for two-point sets E, 

The convergence theorem is as follows. 

Theorem 11-14: If P is normal and if P = P®, then each row of P^ 
converges weak-star to the harmonic measure j8. 

Proof: Apply Theorem 11-13. IfP=P^, then every continuous 
function is P-continuous. 

Thus for normal chains with P = P^, the measure ^ indicates what 
the chain is ‘‘near ” in the long run. In the case of null chains with an 
additional property, we can show that the chain is near the boundary 
in the long run. 

Proposition 11-15: If P is a normal null chain with P = P^ in which 
every one-point set in S is open, then ^(S) = 0. That is, the measure is 
entirely on the boundary. 

Proof: If i is given, then the characteristic function of i is con- 
tinuous. By Theorem 11-14, lim Pq-^ = p{i). But for a null chain, 
P^ tends to zero. 




11-15 



Representation Theorem 



411 



Under the hypotheses of Theorem 11-15, the number j8(X) for any 
Borel subset X oi B may be interpreted as the probability that the 
process is near this part of the boundary in the long run. For example, 
if a: is at a positive distance from other boundary points, then any 
sufficiently small neighborhood E oi x will have a continuous charac- 
teristic function. By Theorem 11-14, 

limPr^Kei?] = ^ P\f ^ Pi^)- 

^ jeE 

Let us see what our results say for a noncyclic ergodic chain P. 
Such a chain is necessarily normal, and a may be chosen to have total 
measure one. If this choice is made, then G{I ~ P) = 1a — / by 
Corollary 9-51. Comparison with Theorem 11-7 shows that a = j5. 
Thus the harmonic measure is concentrated entirely on S, in contrast 
with Proposition 11-15. The measure j8 is a generalization to all 
normal chains of the measure a for noncyclic ergodic chains. Thus 
our results generalize to all normal chains results known for ergodic 
chains. For example, the representation 

^ii = f ^N(x,j)d^(x) 

JSuB^ 

is a generalization of the identity Gij = 2/c which holds for 

noncyclic ergodic chains. (Theorem 9-26 gives 

and Proposition 1-57 yields = 2 ^^kj-) As a second example, 

(P^h)i converges to ah for any bounded function A if P is noncyclic 
ergodic (Lemma 9-52). This result is generalized in Theorem 11-13, 
but in this theorem we had to make a stronger assumption about h. 
The difference arises because in a noncyclic ergodic chain each row of 
P” actually converges to a in the norm topology of measures, not just 
in the weak-star topology (see Theorem 6-38). 

6. Representation Theorem 

Beginning in this section, we connect the results we have obtained 
so far with the results of Chapter 9 on potential theory. We start by 
proving a representation theorem. For the moment we do not assume 
that P is normal. 

If fjL and V are row vectors with p = v{I — P), then fx will be called 
the deviation (from regularity) of v. If |JL^ is finite, then we say that v 
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is of totally finite deviation. Dually f = (I — P)h is the deviation of 
h, and h is of totally finite deviation if a/ is finite. 

Theorem 11-16: If V is a non-negative row vector whose deviation fju 
is totally finite, then /^1 < 0 and there is a probability measure tt on 
jB® such that 

Vy = Vo(ay/«o) + - (/U.1) f °N{x,j)dn{x). 

Jb^ 

If /x1 #0, then the probability measure tt is uniquely determined. 
Proof: We know that 

N,,. = ON,,- + So,. 

If we put fi = v(I — Q), we have = |Ji^ -{■ VqPq^ by direct calculation. 
We therefore get 

(/liV), = (/xOJVT). + (/x1)8o, + 

= + i-o(ay/ao). 

From Proposition 8-7 applied to the chain Q, we see that v = jlN -|- p, 
where p is regular and non-negative. The calculation of piN yields 

f-(/a1) ifj = 0 

Py = Vy - ifiN), = 

[vy - Vo(ay/ao) - (p °N),- if j ^ 0. 

Since pq > 0, < 0. If /x1. = 0, then p is a ^-regular measure > 0 

with Pq = 0. Since it is possible to get from any state to 0 eventually 
in the Q-process, p vanishes identically. The representation then 
follows immediately from the expression for p. 

Thus suppose ^ 0. Define cr by 

- »'oK7«o) - 

We claim that cr satisfies conditions (1), (2), and (3). It is clear that 
o-Q = 0, and the fact that a > 0 follows from what we have shown 
above. For (3), we have 

(orP)y = -(p,1)“l(pQ)y = -(p1)“lp; = O'; + ^o;, 

and thus equality holds. Thus, except for the assertion that tt is 
concentrated on B^, the rest of the theorem is immediate from Prop- 
osition 11-2. So simply note from the proof of that proposition that 
if we have equality in (3), then tt is concentrated on 
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We define the exit boundary of P to be the entrance boundary of P. 
That is, /S* = *a§ and We obtain a dual for Theorem 11-16. 

Theorem 11-17: If A is a non-negative column vector whose deviation 
/ is totally finite, then a/ < 0 and there is a probability measure tt on 
Bq such that 

= + ^ f i)dn(x). 

JBe 

If a/ ^ 0, then the probability measure tt is uniquely determined. 

We turn to results connected with potential theory. We first apply 
Theorem 11-17 to give elementary continuous functions a characteriza- 
tion which is valid for all recurrrent chains, normal or not. 

Proposition 11-18: If A is an elementary continuous function, then 

(1) (/ — P)A = / has finite support. 

(2) A = c1 -f ^Nf for some c. 

(3) af = 0. 

(4) B^h = A for any set E containing the support of /. 

Conversely, if (1) and (2) hold, then A is an elementary continuous 
function. 

Proof: (I — P)B^ is equal to / — P^ for states in E and is 0 
otherwise. Hence a[{l — P)B^] = 0 and (1) and (3) hold for columns 
of B^. Since an elementary continuous function is a finite linear 
combination of such functions for finite sets E, (1) and (3) follow. In 
particular, any column of B^ is of totally finite deviation and Theorem 
11-17 applies. The representation in that theorem establishes (2) for 
columns of B^, and the general result follows by linearity. 

We shall complete the proof by showing that (2) implies (4) and that 

(1) and (4) imply that A is an elementary continuous function. Suppose 

(2) holds. Let 0 e E. Then = B^ by Lemma 9-14. If 

E contains also the support of/, then ^Nf = 0, and we see directly that 
B^h = A. So (4) holds. If (1) and (4) hold, then (4) holds for some 
finite set E, and A is exhibited as a finite linear combination of the 
columns of B^, 

We now assume that P is normal and we shall prove statements for 
T-continuous functions that look like applications of Proposition 11-18 
followed by a passage to the limit. 
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Proposition 11-19: If P is normal, then a function h of finite deviation 
is T-continuous if and only if A = c1 + ^ for a T-continuous potential 
g. If the representation holds, then 

c = I h(x)dp(x). 

J*s 

Proof: If (I — P)h = / and if h is T-continuous, then 

(/ + P + . . . + P^)f = (I - P^^^)h -^h - cl 

by Theorem 11-13. If h is of finite deviation, then/ is a charge and the 
limit A — c1 is a potential. This potential is P-continuous, since 1 is. 
(Notice that 1 is the 0th column of The converse is clear. 

Corollary 11-20: If P is normal and if gr is a P-continuous potential, 
then 

r g{x)dp(x) = 0. 

J*s 

Proof: Potentials are of finite deviation by definition. Applying 
Proposition 11-19 and using the fact that 1 is not a potential (since its 
charge would have to be zero), we see that c = 0. 

Corollary 11-20 is a recurrent analog of the statement for transient 
boundary theory that potentials tend to zero along almost all paths of 
the process. 

Corollary 11-21: If P is normal and if A is a P-continuous function 
whose deviation / is totally finite, then af = 0 and h = c^ -i- ^Nf. 

Proof: By Proposition 11-19, h differs from the potential g of / by a 
constant. But g = g^^ + °A/ by Theorem 9-15. 

Our final proposition enables us to give an interpretation to 6^ for 
certain infinite sets provided we have an interpretation for finite sets. 

Proposition 11-22: If is an increasing sequence of finite sets with 
union E and if 0^1 = 1, then lim = 6^ point wise. 

Proof: /Since dfic is an exit probability, it is decreasing in k from some 
point on. Hence it tends to a limit, say 6^. Since = 1, we have 
< 1 by Patou’s Theorem. For large k, we have also 6f < df^ and 
hence 9 f < dj. Thus 1 = 9^^ < 9^ < l. This statement is consistent 
with the inequality 9^ < 6 only if 9^ = 9. 




11-22 



Representation Theorem 



415 



As an application of Proposition 11-22, suppose P is normal and we 
choose Vj = (?oy. Then 6^ = for finite sets containing 0, and 6^ is 
thus a limit of A^-measures for any set with = 1, whether or not A^ 
exists for the infinite set E. The condition that 0^1 = 1 means that 
the transient chain leaves E with probability one; this condition is 
exactly the statement that E be an equilibrium set for Q^. For such a 
set E and for any reference state i g E, we have 

[G,(I - P^)], = - S„ 

by Proposition 11-3. Since, for tw^o different reference states i in E, 
we have 0^ as the limit of the same sequence A^fc (by Proposition 
11-22), we may write 0^(1 — P^) = 1 Then 

(6§Ge)(I - P^) = 0 

and hence 9^0^ = ka for some constant k. If we form the iT-matrix 
relative to state 0, then Lemma 9-81 gives 

Gij- = K^J 4- (GiQ — C'io)(ay/ao)- 

Consequently, = k'a, provided 

y dfiiQ < 00 and T dfiiQ < oo. 

ieE i€E 

The conclusion that = k'a is exactly the statement that k' be the 

generalized capacity of E and 9^ be the (generalized) recurrent equili- 
brium charge for E in the sense of Definitions 9-88 and 9-113. How- 
ever, in the part of Chapter 9 where these points were discussed, we 
restricted ourselves to finite sets. By means of boundary theory we see 
we can extend these results to certain infinite sets. 

Finally, let us consider the special case where the space for P has 
only one limit point oo. Then a neighborhood of oo is simply the 
complement of a finite set not containing oo. Hence the probability of 
being within this neighborhood tends to one if P is null. Therefore P^h 
tends to A(oo) for any such null chain and any continuous function h. 

If P is null and if P and P both have only a single limit point, then 
P must be normal. For such a chain, 

and 0,, = («,/«,) i), 

and the representation theorem takes on the simpler form 

\ = h, + {^Nf\ - ^(7,0. 

«o 

From Proposition 9-45 we know that 

°Aiy — (cCjlcCo)CiQ = Cqj — Cij. 
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Hence 

h — + (Gq. — C't.)/* 

Thus if Gf is finite- valued, then h differs from Gf by a constant. 

7. Sums of independent random variables 

The results of this chapter take on an especially neat-looking form 
where P is a recurrent sums of independent random variables process 
with state space the lattice in iV^-dimensional space. Such a chain is 
always null, and Spitzer [1962] showed it is always normal, has only 
minimal boundary points, and has no points of S as limit points. In 
two dimensions there is always a unique boundary point, whereas in one 
dimension there are one or two, depending on whether the distribution 
has infinite or finite variance. 

In the case of a single boundary point, a continuous function h is 
one having a limit at infinity, and we have already noted that the 
convergence of P^h is trivial. In one dimension with finite variance, 
the two distinct boundary points correspond to — oo and +oo, and h, to 
be continuous, must have limits in both directions. For such a 
function Theorem 11-14 states that P^h converges to the average of 
these two limits. This result also follows from the Central Limit 
Theorem. 

There are such chains in one dimension for which P^h fails to con- 
verge for as nice a function as the characteristic function of the positive 
integers. However, this behavior can occur only if the variance is 
infinite; then there is only one boundary point, and h is not continuous. 

li h > 0 is a function whose deviation / is totally finite and if Gf is 
finite -valued, then the representation theorem takes the form 

h = — C/ + const. 

for the case of one boundary point and 

hi = — (Gf)i ai b 

for the two-boundary-point one-dimensional case. We have already 
seen that the former identity holds for any chain with a single limit 
point. The latter follows from the representation theorem together 
with special knowledge of the nature of G obtained by Spitzer for sums 
of independent random variables. 

8. Examples 

Example 1: Sums of independent random variables, = |, 

P2 = h 
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This example shows the best possible boundary behavior for a null 
chain and illustrates the points discussed in Section 7. To deter- 
mine the recurrent entrance boundary we can work either with the 
kernel J{i, j) associated to the chain Q or with the matrix 
associated to P, since 

J(i, j) = + 8,0. 

We use the latter. 

To compute ^N, we can proceed in the familiar way — first finding 
and then computing from the identities = P °iV,, = 
1/(1 - °P„), and ^Nij- = As usual, the calculation of 

involves a difference equation valid for certain intervals of i’s and fs. 
In this case, the equation is 

The general solution as a function of i is 

oPi, = A + Bi + C(-2Y 

and is valid always for a slightly larger interval of i’s. For instance, 
if 0 < j and if we are considering i’s satisfying 0 < i < j, then the 
difference equation is valid for 0 < i < j and the solution applies for 
0 < i < j 1. The initial conditions in this instance are °Poy = 

= 1 and j = 1, and they determine A, B, and C. Similar 

remarks apply for the other intervals of i’s and j’s, and the result is: 



for 0 < j 





ri[l - (- 2 )-^! - (- 2 )‘] 


i 


< 0 


ON,, = < 


i - i(-2)-^[l - (-2)*] 


0 < i 


^3 




[j + Ml - (-2)-'] 


j 


< i, 


for j < 0 


'i(-2)‘[(-2)-^ - 1] -i 


i 


^ 3 


ON,, = ^ 


i[l _ (_2)<] - i 


j < i 


< 0 




0 


0 


< i. 



Therefore there are two boundary points, -foo and — oo, no point of S 
is a limit point, and 



°A(4-oo, j) 
^N(-coJ) 



■j + i[l - (-2)-^] i > 0 

0 j < 0 

'i[l - (-2)-^ 3 > 0 

]j\ j ^ 0 - 




418 



Recurrent boundary theory 



From the Central Limit Theorem we know that for large n the 
process is very likely to be far from 0 and equally likely to be on the 
right or on the left. Hence j3(+cx)) = j3{-oo) = Since jS assigns 
positive mass to each boundary point, both boundary points are 
extreme and consequently every continuous function is T-continuous. 
From Theorem 11-7 we have 



= J°iV(+oo,j) + \ 



and for sums of independent random variables we have also 



Therefore 



= G,, = 



Cy = G^j = <3*0. J -i- 

W - + i[l - (-2)'-'] j > i 

MJ - ^ j ^ i- 



Let us see what the representation theorem says for this process. 
We first need to know i). We can take a = 1^, and we see that 

Pij = Pji and For P we again have two boundary points, 

and 



°J9'(+oo, i) 



— 00 , i) 



m - (-2)‘] 


i < 0 


,KI 


0 < i, 


■|t| + Ml - (-2)'] 


i < 0 


0 


0 < i. 



Now let > 0 be a function whose deviation / is totally finite, and 
suppose Cf is finite-valued. The representation theorem gives 

hi = ho + (°Nf)i - (a/)[7r(+oo) °J^(+oo, i) + Tr(-co) °^(-co, i)]- 
If we use the identity 

°Nij - (aJao)Cio = Coy " Cy, 

we obtain 



— ~{Cf)i + («/)[7t( — oo) — ^]i + [Kq + (C/)o). 



This last equation is an example of the formula 

h = - {Cf)i + ai + b 

discussed in Section 7. 



Example 2: Basic example, null case. 
For the basic example we have 



°Ni. = 



= ajja^ if j > i > 0 



0 



otherwise. 
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Hence, for fixed j, = 0 if i is sufficiently large. Consequently 
there is a single limit point p in and ^N(p, i) = 0 for all i. Since 
^N(p, i) = ®iV'(0, i), we have p = 0; the limit point is the state 0. 
Thus the boundary is empty. 

We know from Section 9-6 that in a null basic example Af = 1 if JF 
is a finite set containing zero. Therefore the harmonic measure j8, which 
is the weak-star limit of such measures, assigns unit mass to state 0. 
Thus by Theorem 11-7, Gqj = °iV(0, 0) = 0 , in agreement with the 
result obtained in Chapter 9. This example shows that the condition 
on one-point sets in Proposition 11-15 cannot be omitted. 

We can also check directly, using the results of Section 9-6, that 
= ^N(0,j) and G(I — P) = ISq- — again in agreement with 
Theorem 11-7. 

The reverse P of the basic example has 

fl if i > j > 0 
10 otherwise. 



Again there is one limit point p', but this time we have ^^{p',j) = 
1 — Soy The measure J{p', • ) associated with the transient chain 

jPij if i / 0 

"to it i . 0 

satisfies J(p',j) = ^^(p\j) + S^q = 1 and is easily seen to be Q- 
regular, and it follows that p' is a boundary point and is extreme. Put 

+ 00 = p'. 

It is clear that = 1 if m is the largest element of the finite set E 
and if P is null. Hence i3(+oo) = 1, in agreement with Proposition 
11-15. Therefore = *^(+oo, j), and we find that G{I — P) = —I, 



Example 3: Three-line example. 

This example is designed to show that P and P can have different 
limiting behavior and to show that in a normal null chain some addi- 
tional assumption is needed (such as P = Pg in Theorem 11-14) to 
ensure that each row of P^ converges weak-star to a limiting measure. 

Let 8 consist of three copies of the non-negative integers with typical 
elements denoted by i, i' , and i'\ respectively. The process P moves 
deterministically to the left on the first and third lines and moves from 
0 or 0" to 0' also with probability one. On the middle line it moves 
one step to the right or moves to one of the other lines, as shown in the 
accompanying figure. The quantities on the arrows are transition 
probabilities, and p and q are positive numbers with sum one. The 
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Pi and Qi are chosen as in the basic example, except that = 0, is 
defined as the product of the p/s up through Pi as in the basic example, 
and we require that 0 and 2 A = +oo. 






O' 



/-I 






I A 

(/-I)' I 



L 

i! 






X" 



Clearly i?o o' = 1 “ lim = 1, and thus P is recurrent. If a is 
defined by 

— PPu — Pif — QPu 

then a is a regular measure. Since a^ = 2 2 A = +oo, the chain is 
null. 

To see that P is normal, we compute °'A and ^"X. Since P is null, 
the process after a long time is likely to be far to the right of any state 
we consider. Thus 

{ p a ae L 
q if a e L" 

0 if a G L' and a / O'. 



The reverse process P moves a step to the right on L (or L") or switches 
to U . On U it moves deterministically to the left until it reaches O', 
and then it goes to 0 with probability p and 0" with probability q. 
Thus 



{ 0 a ae L or ae U 
1 if a e L' and a ^ O'. 



By Theorem 9-26, P is normal. 

We now determine the boundaries of P and P. If we choose 0' as 
the distinguished state, we see that if a and b are any two states 
different from O', not necessarily on the same line, then 






ab 



'l if a, b e L or a, b e L", a to the right of b 

p if a G L', b E L, a to the right of b 

<q if a E L\ b E L", a to the right of b 

ocfjla^ if a E L', a to the left of b 

0 otherwise. 




11-22 



Examples 



421 



Thus for each h, ^'N^b tends to a limit along each line, but, as functions 
of 6, the limits are different for each line. The result is three boundary 
points 00, oo', and oo", and we have 





n 


if be L 


°'N(oo, b) = i 




10 


otherwise 




(p 


if be L 


°'N(ao', b) = 


V 


if be L" 




lo 


otherwise 




fl 


if 6 e L" 


°'N(cx)”, b) = ^ 




0 


otherwise. 



The measures J(x, •) defined for the associated transient chain satisfy 
J(x, b) = °'N{x, b) + §J 0 ', 
and direct calculation shows that 

J(oo', •) = pJ{co, •) -f- qJ{co", •). 

We conclude that oo and oo" are extreme boundary points, whereas 
oo' is not. 

If is a large finite set of states containing 0 and 0", then the chain 
P is most likely to be to the right of E. Hence Af = ^ and = q, 
where k and m" are the last elements of E on the first and third lines, 
respectively. From the form of A^, we deduce that 

j 3 (oo) = p and j 8 (oo") = q. 

We may interpret this result as follows: L, L', and L" are neighborhoods 
of 00 , 00 ', and oo", respectively. In the long run, the chain is typically 
far to the right in one of these sets. If it is in L or L", it must remain 
there for a long time; but if it is in L\ it can leave in one step by switch- 
ing to L or L". This behavior is what makes oo' nonminimal. In other 
words, far out in L is near oo, far out in L" is near oo", but far out in L' 
means near oo with probability p and near oo" with probability q. 

Now let us consider the boundary of P. We are to look at the limit- 
ing behavior of along sequences of a’s. But for 

fixed 6, this quantity tends to the same limit along all three lines. 
We thus have just one limit point do, and the corresponding measure is 



J{db,b) = °'S(d^,b) + = 



1 if 6 e L' 

0 otherwise. 
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This measure is regular for Q, and consequently cfc is an extreme 
boundary point. By Proposition 11-15, j$(ob) = 1. For P the chain 
either is far out in U (and hence has to stay there for a long time) or 
else is in a set from which it can move in one step at any moment to a 
position far out in i'. The three boundary points for P collapse into 
one for P because a position far out on L or U' is only one step away 
from being far out on U. 

We conclude by sketching a proof that the O'th row of P" does not 
converge weak-star as n tends to infinity if the ^/s are chosen 
appropriately. 

Thus some condition such as P = is needed in Theorem 11-14, 
and some condition such as P-continuity is needed in Theorem 11-13. 
Let h be the characteristic function of U . Then h is continuous, since 
h(co) = h(co") = 0 and A(oo') = 1; but {Th){co') = 0 and thus Ji is 
not P-continuous. Let 



a„ = (-P"A)o' = Pro'[a;n e L']. 

We shall show that {a^ does not converge if the ^/s are chosen suit- 
ably, and hence Pg.. does not converge weak-star. In fact, we shall 
indicate that can fail to be even Abel summable. We define 

= P(«) = 2W 

If we let k be the last time before n that the process is in O', we see 
that 

= 2 ^0%'^n-k 



and hence 

For any chain, we have 



k = 0 

A{t) = P(t)B{t). 
1 



m = 



For this chain, 



1 - F{t) 

F^2n + 2) = - ^n + l> 

whereas Po”o' = 0 if m is not of the form 2n + 2. Thus 
F(t) = 1 - (1 - P)B{P). 
Combining our results, we have 

A(t) = 



B{t) 



(1 - P)B{P) 
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The Abel limit of the sequence is 

tti ^ ) 

Now the sequence {j3^} which defines B{t) is an arbitrary decreasing 
sequence of positive numbers subject only to the conditions that 
lim = 0, 2 = + 00 , and jSo = = 1. Such a sequence can be 

found so that the expression B{t)jB{t^) oscillates as ^ f 1. 



9. Problems 

1 . For sums of independent random variables on the integers with P-i = 
I and p 2 = i, show that if h is continuous and is of finite deviation, then 
af = 0. 

2. Prove that if h and Th are both continuous, then h is T- continuous. 

3. Show that for any normal chain (/ — P)C = ba — I. Identify the 
vector b. 

4. What identities hold for (I — P)K and K{I — P) in a normal chain? 

5. Let P be normal and let h be continuous. The balayage potential of h 
on a small ergodic set E is B^h — (A^A)1 (see Proposition 9-43). Let 
{E^ be an increasing sequence of finite sets with union S. Prove that 
the balayage potentials of h on E^ converge to ^ — ( j hd^)^ on S. 

6. Show that if P is a normal null chain with B = Bg and if a: is a point of 
the boundary with a neighborhood in S* containing no other limit points, 
then lim,j e E] exists for all sufficiently small neighborhoods E and 
is the same for all such sets E. 

7. Prove that in a normal chain the elementary continuous functions are 
exactly the functions that can be written as the sum of a constant and a 
potential of finite support. [Hint: Use Theorems 9-15 and 11-18.] 

8. Prove that if P is a normal null chain and if P.y has only finitely many 
non-zero entries, then jSy = 0. [Hint: Consider columns of / — P as 
charges.] 

9. Show that every P- continuous potential in a normal chain is the uniform 
limit of potentials of finite support. 

Problems 10 to 15 refer to the nuU chain of Chapter 9, Problems 11 to 22. 

We shall use the notation and results there developed. 

10. Show that the recurrent entrance boundary is empty and that = I, 

11. Show that S is the unit interval when parametrized by t = lim (xjn) and 
that 

- <)“-'• 

12. What is the form of the most general non-negative function regular for P 
except at 0 ? 
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13. In Problem 12, let Aq = ^ use Lebesgue measure on £. Verify that 
h is regular except at 0. 

14. Show that is Lebesgue measure on JB, 

15. Verify the interpretations of p and ^ as limits of rows of P”* and by 
showing this: For either chain n is most probably large in the long run. 
For P, the ratio xjn cannot change much in a few steps (if n is large), 
but for P a transition to 0 is always possible. 




CHAPTER 12 



INTRODUCTION TO RANDOM FIELDS 

David Griffeath 



1. Markov fields 

One means of generalizing denumerable stochastic processes 
with time parameter set = (0, 1, . . .} is to consider random fields 
where t takes on values in an arbitrary countable parameter set T, 
Roughly, a random field with denumerable state space S is described 
by a probabihty measure on the space Q — ^ of all configurations of 

values from S on the generalized time set T. In this chapter we discuss 
certain extensions of Markov chains, called Markov fields, which have 
been important objects of study in the recent development of proba- 
bility theory. Only some of the highlights of this rich theory will be 
covered; we concentrate especially on the case T = Z = the in- 
tegers, where the connections with classical Markov chain theory are 
deepest. 

Proceeding to the formal definitions, assume as usual that the state 
space aS is a countable set of integers including 0, but let the time 
parameter set T be any countable set. The configuration space Q = S'^ 
is the space of all functions oj from T to S. An element cu = {a>^; t e T] 
of Q is called a configuration, and is to be thought of as an assignment 
of values from S to the sites t oi T. The outcome function x^ from Q 
to S takes the configuration to its value at site t. Let ^ be the 
minimal complete a-algebra with respect to which all the outcome 
functions Xi, t g T, are measurable. In this context, we introduce the 
following definition. 



Definition 12-1 : A random field is given by {Q, 3S, fi, {xj), where fi is 
a probability measure on (Q, such that 

Pr[xj = A] > 0 
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for all finite (non-empty) A cz T and arbitrary ^ e 8. (As always, 

Pr[^)] = ^({o>|3)}).) 

As in the case of stochastic processes, we often identify a random 
field with its outcome functions {x^} (the remaining structure being 
understood). At other times it will be more convenient to think of fx 
as the random field. Our positivity assumption on cylinder proba- 
bilities ensures that all conditional probability statements are well- 
defined. The role of transition probabilities at this new level of 
generality is played by characteristics, which we define next. 



Definition 12-2: Given a random field and finite (non-empty) 
sets A and A such that A c: A, the (A, i4)-characteristic is the real- 
valued function on Q given by 

)Lt^(t) = Vv\Xi = i^ for dbW t e A \ x^ i^ for diWt e A — A^ 

when evaluated at the configuration i = {i^; t g T). For ae T, we 
abbreviate the collection as A c: T) is called the local 

characteristics of the random field. 

Throughout this chapter A and A will always he finite subsets of T, 
even when not explicitly identified as such. 

Our immediate objective is to formulate the notion of a Markov field. 
As motivation, we return briefly to the setting of Chapter 4. 



Definition 12-3: A denumerable stochastic process {x^^} satisfies the 

two-sided Markov property if 



Pr[x„ = i„ 



Xi^ = k e {m,m + 1,. . ,,M} - {n}] 



Pr[xo = io 
Vr[x„ = 



= iA iiO = m = n<M 

^n-l ~ ^n-1 ^ ^n + 1 ~ ^n + l] 

if 0<m<n<M 



whenever Fr[xj^ = ijc', rn < k < M] > 0. 



Proposition 12-4: Any Markov chain {x^ satisfies the two-sided 
Markov property. 

Proof: Let tt be the starting distribution, P the transition matrix for 
{Xyj}. When Pr[x;^ = ijc\ m < k < > 0, we consider the quantity 

Pr[o;,, = i^ \ x^ = k e [m, m + 1, . . . , If} - {n}]. If 0 = m = n < 
M this is simply Fy[xq = i^ \ = iA by reversibility of the Markov 
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propert}^ (Proposition 6-44). Otherwise the above conditional proba- 
bility may be evaluated as 

M-l 
k = m 



'fc=m ' k=n+l 

— p . P . 

in-l*n inin + l/"*’ in-l^n + l 

= ^Pi„ . ^iPinln . " )i„ - - 1«» . . 

= PrK = in I ^n-1 = in-1 A a:„ + i = in + ll 

The two-sided Markov property, unlike the ordinary Markov property, 
generalizes to any parameter set T which has a neighbor system, i.e., a 
collection d = {da; a e T} of finite subsets of T such that (i) a ^ da, 
and (ii) a e db if and only if fe g da, a,b e T. The sites b e da are called 
the neighbors of a. We write a = {a} U da. Also, for Ac: F let 

dA={beT — A\bGda for some a g A]; A = A \J dA. 

Definition 12 - 5 : Let T have neighbor system d. The random field 
{x^,t e T] is a Markov field (with respect to d) if 

H'a = whenever a cz A ^ T, A finite. 

We shall usually assume an underlying neighbor system for T, and 
simply refer to the Markov field [x^. Note that any Markov chain 
with strictly positive cylinder probabilities is a random field, where 
T = N. The natural neighbor structure on N is ^0 = 1, and for 
n > \, dn = [n — I, n In this case the Markov random field 

condition is precisely the two-sided Markov property. Proposition 
12-4 shows that any Markov chain with positive cylinders may be con- 
sidered as a Markov field on Q = 8^. Later we will see that the classes 
of Markov processes with positive cylinders and Markov fields on 
actually coincide. 

A random field is called finite when T is a finite set. Such fields have 
an elementary theory, which will be developed in the next two sections. 
First, though, we note that the Markov field property simplifies some- 
what when T is finite. 

Proposition 12 - 6 : Let {x^ be a finite random field. Then the follow- 
ing three conditions are equivalent: 

(1) {x^ is a Markov field. 

(2) jLta = /xg for all ae T. 

(3) /xj(0 = whenever ae T and = i[ for all tea. 
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Proof: When T is finite, (2) is simply the Markov field property 
with A = T, while the fact that /xg depends only ontea together with 
(2) implies /Xa(t) = /xg(t) = /xl(t') = whenever i and t' agree on a. 

To see that (3) implies (2), fix ae T and let k = t g T — a}he any 

prescription of values from /S on T — a. Denote by p, g, and r the 

statements 

= h ^ ^ = h aill t G T — a, 

respectively. Then (3) asserts that 

Pr[p I 3 " A r J = c for all k, 

or equivalently, 

Pr[p A q A r^] = c Pr[^ A rj. 

Summing over all possible /c, we obtain Pr[p I g] = c. ThusPr[p \ q A 
= Pr[p|g'], which is precisely (2) when k^ = i^. for silt g T — a. Finally, 
to show that (2) implies (1) we choose a A T and l = (ij g Q. 
Since /xg depends only on sites in a, (2) yields 

Pr[iTt = it for oI\t G A A = kf. for all t g T — A] 

= ~Pr[Xi = ^ for all ^ G ^ — [a] a x^ = k^. for all ^ g jP — yl] /xg(t), 

where k = {k^;t g T — A} is any prescription of values from S on 
T — A. Summing over all possible /c, we conclude that /x^(0 = mKO- 

2. Finite Gibbs fields 

In this section we introduce an extremely useful representation for 
the measure /x of an arbitrary finite random field. The inspiration 
behind this approach (and hence most of its terminology) is derived 
from statistical mechanics, where random fields may be considered as 
equilibrium distributions for a variety of physical systems. 

Definition 12-7 : A potential C7 on a finite set T is a family 
A ^ T) oi functions from Q to the real line U with the property 
that U^(l) = whenever i^. = i[ for all tGA, and such that 

U 0 = 0. The energy Hjj oi the potential U is given by 

Hu= 2 

Act 

U is said to be normalized if = 0 whenever = 0 for some a g A. 
When T has a neighbor system d, a set G T called a clique if 
h G da whenever a, b gC, a ^ b, i.e. if every two distinct sites in C are 
neighbors. Let ^ be the class of all cliques in T. U is called a 
neighbor potential if = 0 whenever A 
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Definition 12-8: A finite random field {x^} is a Gibbs field with potential 
?7if 

H'ib}) = I e Q, 

where z = ®^p{^c7(0} is often called the partition function. Im- 

plicit is the assumption z ^ +oo, which imposes a condition on 
U. If C7 is a neighbor potential, then {x^} is called a neighbor Gibbs 
field, and = 2ce<^ ^c- 

We remark that the potential and energy of random field theory 
should not be confused with those of Markov chain theory presented in 
earlier chapters. These terms have common origins in classical physics. 

Example 12-9 : T is sometimes called a cubic lattice if no clique in the 
neighbor system for T has more than two elements. The most im- 
portant examples are subsets of the d-dimensional integer lattices Z^, 
where da = {b e T\\a — b\ = 1}. When T is a finite cubic lattice and U 
is a neighbor potential, the energy function becomes 

+ 2 ^{a, b) where = {{a, b}: b e da}, 

aeT {a,b}eJ^ 

In this case U is called a neighbor pair potential. 

Two lemmas prepare the way for the representation theorem for 
finite random fields. Given t = and A T, the modification 
= \if} of I has values 

f if for t e A 
n — "S 

10 otherwise. 

We abbreviate when a^A and when 

ae A, 

Lemma 12-10: If [x^ is a finite random field, i = {ij e Q, A T 
and a ^ A, then 

Proof: = Pr[Xf = if- for all t g T]/'Pr[Xi = if for alU g T — {a}]. 

When we replace by the denominator is unchanged, since if = 
if-^^ iorteT - {a}. 

Lemma 12-11 (Mobius inversion formula): Let A be finite, and let 0 
and W be real-valued set functions defined on all subsets of A. Then 

0(A) = 2 for all A e A 

BczA 
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if and only if 

V(A) = 2 ^(-B) for all A A. 

BcA 

(Here \A\ denotes the cardinality of the set A.) 



Proof: Assume that the first condition holds. Then 

2 ^(B) =22 



BcA 



B<=ADcB 



= 2 2 (-1)'^' 

DcA lEcA-D 



{E = B - D) 



= nA), 



since the bracketed sum above is 1 if Z> = ^ and 0 otherwise. The 
opposite implication is verified by an analogous computation. 



Theorem 12-12: Let {x^} be a finite random field with local character- 
istics {fii}. Then is a Gibbs field with canonical potential V defined 
by V 0 = 0, and for A / 0 , 

VaU) = 2 

BcA 

BcA 

for any fixed aeA. Moreover, V is the unique normalized potential 
for {Xi}. 



Proof: Let 0 denote the configuration with a 0 at every site of T. 
Fix leQ. For A ^ T, define W{A) = ln[ja({t^})/ja({0})] and ^{A) = 
where V is the potential given by the first sum in the 
Theorem. When ^ 0 we have ^bca i~ 1)'^"^' = 0, and hence 



0(A) 



Va(^) 



2 

IbcA 

2 i-iy^-^'nB). 

BcA 



-BcA 



When A = 0, 0(0) - V,^,(l) = 0 = ln[/x({t^})/iu.({0})] = *f^(0), since 
= 0 for any i. Applying Lemma 12-11 with /I = T we conclude that 



and hence 



In 



^(W) /^(W) 



2 VM = By(0, 

BcT 



/x({t}) = fjL{{0})e^v^‘’^ = z 
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Thus (xj is a Gibbs field with potential V. For any ae A c: y we 
can write 



B<zA-a 

This shows that V is normalized, since whenever i^ = 0. By 

Lemma 12-10 the right hand side of the last equation may be rewritten as 

BcA-a 

= 2 

BczA 

which establishes the second expression for V in the statement of the 
Theorem. Finally, suppose that U is any normalized potential for {Xi}, 
Then H^iO) = 0 and = 0 unless D c: B. Therefore 



= 2 = 2 
/^(W) UcB 

whenever B A ^ T. If we apply Lemma 12-11 with A = A, 
^{D) = and W{B) — ln[jLt({6^})//x({0})], the conclusion is 

Bc.4 

since the last sum is 0 when A = 0 , and otherwise 
lnM{0})2B=^(-l)'^-®‘ = 0. 



Corollary 12-13: A finite random field is completely determined by its 
local characteristics 



Proof: The second equation in Theorem 12-12 shows that the 
canonical potential F for /x is determined by the local characteristics, 
and V determines fx. 



Proposition 12-14: Let {xj be a finite Gibbs field with potential U. 
Then the canonical potential V for {x^} is related to U by 

VA^)= 2 ^^ 0 - 

BcAcDcT 



Proof: Since {xt} is Gibbs, 



In 






2 
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for any aeT and B ^ T — {a}. Using the first equation for V in 
Theorem 12-12, it now follows that whenever a e A T, 



F.w- 2 (-1)''— 'In^ 

BcA-a H'KV' 

= 22 

DczTBczA 

= 2 2 2 

DcTBicDnAl ^Bz^DnA 

The inner sum in the last expression is 0 unless D C\ A = 0 , i.e., 
unless A D. 




Corollary 12-15: Given two potentials U' and U", let — U'^. 

U' and U" determine the same finite Gibbs field if and only if 

2 ( — = 0 for every ^ ^ 0. 

BcA^DczT 



Proof: Letting F' and V" be the canonical potentials corresponding 
to U' and U" respectively, Proposition 12-14 shows that the given 
equation is equivalent to F^ = 

3. Equivalence of finite Markov and neighbor Gibbs fields 

We now prove an important equivalence theorem which states that 
the finite Markov fields are precisely those for which the canonical F 
is a neighbor potential. 

Theorem 12-16: Let [x^ be a finite random field with canonical 
potential F. Then [x^ is a Markov field if and only if F is a neighbor 
potential. 

Proof: Fix aeT and l, l e Q such that ^ = i[ whenever tea. 
Let and be the modifications of t and l respectively obtained by 
replacing the value at site a with s eS. If F is a neighbor potential, 
then 



Hy { h )= 2 

A<^T-{a) aeAc^a aeA<^a 

= 2i + Sal'S) + 0, 

where 2i is independent of s. Let 21 and Sal'S) be the corresponding 
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sums when is replaced by Since and agree on a, we have 
22 ( 5 ) = 22(^) for all 5 . Thus 



seS seS 



e2i + 2^<»o) 
seS 






for any 5 q e /S. Taking 5 q = ia = ia we have verified (3) of Proposition 
12-6; this shows that {x^} is Markov. Conversely, if {x^} is Markov, we 
claim that the canonical potential F is a neighbor potential. To see 
this, choose a,b e A <= T such that h ^a. Expand V as 



VA^) 



I 



BczA-ia.b} 



(_1)U-Bl In 






Since b^a, Proposition 12-6 shows that and 

^r(^B + a + &) ^ + yielding the desired result. 



Corollary 12-17: There is a one-to-one correspondence between the 
local characteristics a e T} for finite Markov fields and normalized 
(canonical) potentials V for finite neighbor Gibbs fields, given by 



and 



= 2 aeAe^ 

Bc:A 

(so 




where z is the appropriate normalizing constant. 



Proof: If {x^} is Markov, then F^(0 = 0 for by the last 

theorem. The rest of the first equation above was proved in Theorem 
12-12, and the second equation was derived in Theorem 12-16. 

Another consequence of Theorem 12-16 is an alternative formulation 
for finite Markov fields. 



Corollary 12-18: A finite random field is Markov if and only if 

whenever A A T. 

Proof: Let {x^} be Markov, with canonical neighbor potential F. 
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A computation similar to the one in the proof of Theorem 12-16 shows 
that 

H'aU) = 2“^ expj 2 

^B<=A:BnA^0 ) 

whenever = i[ for oXite A. The claim now follows from the straight- 
forward generalization of Proposition 12-6 obtained by replacing {a} 
with A, 



Example 12-19: The Markov process case. Let 0 < < iV} be 

a denumerable Markov process viewed from time 0 to time N, Suppose 
that has starting distribution tt and one-step transition matrix P, 
at time n, where 



PnM = Pr[a:n + 1 = j\Xn = »]• 

If TT and all the P^ are strictly positive, then is a Markov field with 
neighbor system d, where dO = {1}, dN = [N — 1}, and dn = 
{n — 1, 71 + 1} for 1 < n < N. The local characteristics for {icj are 
given by 






0.«i 



P P 

2 + i 



ieS 



I < n < N, 



— PN-l,iN-iiN 

The canonical potential for the process is then given by 



Va^) = ln/x?:(0K(O), 



F^(0 = In 






= 0 



A = [n — n), 

otherwise, 



0 < n < N , and {x^} is a neighbor Gibbs field with normalized potential 
V. Conversely, suppose that {x^; 0 < n < N} is db Markov field with 
the above neighbor system d. A routine calculation using the explicit 
representation for /x in terms of F shows that 



+ 1 = j I , a^o = io] = + i = j \ 

whenever 0 < n < N, so {x^} is a Markov process. On a finite time 
parameter set we see that the one and two sided Markov properties are 
equivalent, so that Markov fields are precisely the Markov processes 
with positive cylinder probabilities. 
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4. Markov fields and neighbor Gibbs fields : the infinite case 

For the remainder of the chapter, T will be a countably infinite set 
with some neighbor system d. A neighbor potential U on T is defined 
just as in the previous sections. But we can no longer define the 
probability measure fx for a neighbor Gibbs field with potential U 
explicitly; instead we must make use of the local characteristics. 

Definition 12-20: An infinite random field {ccj is a neighbor Gibbs field 
with neighbor potential U if 

2 ^ b{^) f for finite A ^ a, 

\aeBcza ) 

where z is the appropriate normalizing constant. If s eS and is the 
modification obtained by replacing i^ with s, then 

z = 2 expj 2 _ 

S€S vaeBca 

Theorem 12-21 : Let {x^} be an infinite random field. Then {x^ is a 
neighbor Gibbs field if and only if it is a Markov field. 

Proof: Suppose { x ^} is neighbor Gibbs. From the definition we see 
that /Xa(0 = H'aW) whenever A ^ a and t' agrees with l on a. Just as 
in Proposition 12-6, this implies that for all finite A ^ a, 

so {x^} is Markov. Conversely, if {x^} is Markov we define a potential V 
by = 0 and 



= 2 aeA. 

B<=A 

The argument of Theorem 12-16 shows that F is a neighbor potential, 
and the only normalized potential determined by the An applica- 

tion of the Mobius inversion formula shows that F satisfies the con- 
ditions in Definition 12-20, so {x^} is a neighbor Gibbs field. We remark 
here that Corollary 12-18 also holds for infinite fields; the proof is 
routine. 

Let = ^v(T) be the class of neighbor Gibbs fields on the infinite 
set T with canonical potential F. The bijection of Corollary 12-17, 
which carries over to the finite subsets of T, shows that we may con- 
sider equivalently the class of Markov fields with local characteristics 
{/x^} corresponding to F. Since we will be considering many fields 
simultaneously, the elements of ^y will be thought of as measures /x 
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governing {Xt}- In the finite case we have seen that always contains 
a single field. When T ^ infinite this need not be so; indeed, there are 
exactly three possibilities: 

(i) = 0 (ii) l^vl = 1 (iii) |^y| = oo. 

This follows from the fact that is convex, i.e., if 
0 < a < 1, then a/xi + (1 — cc)/x 2 ^ Examples of (i)-(iii) will be 
presented in Section 5. 

Definition 12-22: When l^y| = oo we say that there is phase multi- 
plicity (or phase transition) for F. A measure fie^y is extreme if 
whenever /x = oc/xi + (1 — a)/x 2 , /xi, /X 2 G ^y, 0 < a < 1, then /x^ = 
/X 2 = /X. The class of extreme elements of ^y is denoted by <^y. 

Since ^y is convex, in the case of phase multiplicity one would hope 
for an integral representation in terms of #y. We will obtain such a 
representation, along with a number of other results on the structure of 
^y, by connecting neighbor Gibbs fields with Martin boundaries for 
certain Markov chains. The remainder of this section is devoted to the 
study of general structural properties of ^y with the aid of the boundary 
theory developed in Chapter 10. 

To begin, we fix a neighbor potential F, assume ^y ^ 0 , and choose 
a reference measure v e ^y. Also we fix an increasing sequence 
[A{n), n = 0, 1, . . .} of finite subsets of T, such that A{n) c: A(n^ + 1) 
and A(n) f T sisn CO. Write X(0) = A(0),K(n) = A(n) — A(n — 1) 
for n > 1. Then any configuration t e may be thought of as a 
sequence of sub configurations (/c°, . . .), where t e K(n)) 

satisfies = i^ when teK{n). For brevity’s sake we denote 
{co I x^{a>) = kf for all t e K{n)} simply as [/c^]. Similarly, [ac°, /c^, . . . , /c^] 
means {a> | Xt(o)) = k\ for all t e K{r), 0 < r < n], and so forth. Also, 
we write v{A\B) = v{A n B)lv(B) when convenient (and, of course 
v(B) > 0). With these notations in effect, observe the following key 
property of neighbor Gibbs fields. 

Proposition 12-23: If l = (/c°, k^, . . .) e Q, then 
+ \ /C^] I [/c^ . . /C^]) = ac^] I [k^]) 

I < m < n < CO. 

Proof: If {i^^} are the characteristics of v, then the left hand side 
divided by the right hand side is equal to t) = 1, since 

the numerator and denominator of this last quotient both equal 
by the Markov field property. 

The above result reveals a Markovian structure for v which can be 
exhibited explicitly in terms of a Markov chain. The states for the 
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chain are all possible pairs (n, /c^), n > 0, The transition 

matrix is 



+ = v([/c" + ^] I [«"]). 

Proposition 12-23 and a simple induction show that the 7i-step transition 
matrix is given by 

P\V..-Xrn^n..-^'', = v([/c’" + "] 1 M). 

If the initial distribution is 7 T(o,k:°) = {y^} denotes the result- 

ing chain, then we obtain the simple relationship 

= {n, /f")] = v([k"]). 

Next, we connect P-regular functions with fields in ^y. A lemma will 
be useful for this purpose. 



Lemma 12-24: If jjle l = (ac°, k^, . . e Q, then 



/x([acQ, /c^ . . ., K^]) _ . . . , /C^]) 

v([fC°, /C\ . . . , K^]) v{[k^, . . . , K^]) 



< m < n < CO. 



Proof: The left hand expression may be rewritten as 

k"]) 

and the two characteristics agree, being identical functionals of the 
potential V. 

As in Section 10-6, we call a non-negative P-regular function h 

normalized if 



— 2 ^(0,k°Ao,k:°) — 

KOeSff(O) 

and minimal normalized if it cannot be written as a non-trivial con- 
vex combination of two distinct non-negative regular functions. 

Theorem 12-25: There is a one-to-one correspondence between non- 
negative normalized P-regular functions h and neighbor Gibbs fields 
\x.E^y given by 

V.k”) = 

and 

> 0, /c^ G Moreover, h is minimal normalized if and only if /x 

is in #y. 
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Proof: Given /x e define h according to the first equation in the 
theorem. By Lemma 12-24, 



2 + 
icn+i 



'(n 






v([x", /x([/c", 



KM) 



i^(M) 

KM) 






SO Ti is P-regular. Clearly, h is also non-negative and normalized. 
Conversely, let Ti be any non-negative normalized P-regular function. 
We claim that the second equation of the theorem prescribes cylinder 
probabilities for a unique measure /x g determined by h. Note first 
that h must be strictly positive. To see this, write 






and 



hn.K^) = = 2 /r n-|\ ^(n + 1 ,k” ^ ^)- 

^n + i vq/c ]) 



The first equation above implies that A(n + 1 ^ ^ > > 0 for some ^ ^ , and 
hence the second shows that \n,K^) > ^ '^^)- Thus all cylinder 

measures for jtx are evidently positive. Next, we use Proposition 12-23 
to compute 



2 = v([k°,...,k"]) 2 



v([k°, 






= v([/C°, . . . , = ^([,c°, . . . , /c""]). 



This shows that the measures /x([/c°, . . . , on are consistent for 
n > 0, and also by induction that 



2 2 • • • . = 2 • • • ’ 1 

for all n > 0, since i^([^°]) = Trh = 1. By the usual extension 
theorems we obtain a unique fju with the desired cylinder probabilities. 
To see that /x g ^y, fix finite A, A, with A A, and choose n large 
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enough that A A(n). Let t = {i^; t e T} = (/c°, . . . ) g 12. Intro- 

duce the sets 

Ua] = {(o \ coi = ii for all t e A}, 

[la-a] = {oj \ oji = ii for allte A — A}, 

[i/f] = {(o I ojf. = ijji for all t G A(n — \) — A] where ifj 

Then by construction, 

f^([‘ j) = 22 ^ 

and 

j) = 22 w kdv.k")- 

where ijj is summed over over If denotes the 

modification of i obtained by replacing its values on yl(n — \) — A with 
0, and its values on K{n) with /c^, then = vf(t) for all ifj and /c^, 

since v G^y. Thus 

M([‘ J) = 22 ^ W ^ 

}J/ k" 

and so = v^(t) = v^(0* Hence /x g ^y. To check the one-to-one 
correspondence, let Jf’ denote the normalized non-negative P-regular 
functions. We have defined mappings p: ^y— and a: ^ ^y. 

One easily verifies that cj(p(/x)) = /x and p(cr(A)) = A, as desired. Finally, 
p is obviously a cone homomorphism in the sense that 

p(ap.i + (1 - a)fi2) = ccpiH'i) + (1 - a)p(/^2). 

0 < a < 1, jLti, /X 2 G ^y. This implies the last assertion in the theorem, 
and the proof is complete, 

Using the Martin boundary theory for the chain P with starting 
distribution tt, as constructed from the reference measure v e^y, we 
will now derive several structure properties of ^y. 

Definition 12-26: When puG^y and n > 1, set 

/X-”(K, . . . , K-^]) = piZU.iA t K\,,.)eQ. 

For each fixed on K{n), /x'^”( ) defines a finite random field on 
^yi(n-i) fields /x'^" are said to have thermodynamic limit /x^o, 

^-lim^_,oo = Poo in notation, if there is a measure /Xqo on Q such that 

lim /x-"([/c°, . . . , /c-]) = /x«([/c^ . . . , K-]) 
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for all m > 0 and all configurations (/c°, k^) on 



Theorem 12-27: 



V 




^-lim 

n-foo 



^kHco) g 



}) 



= 1, 



where k^{oj) is the configuration of o> restricted to K(n), In particular, 
# 0 whenever ^ 0 . 



Proof: Let comprise the extreme points of the Martin boundary 
for P started with tt, and let A denote harmonic measure. Theorem 
10-41 applied to the constant regular function h = I shows that 
e J5g] = A(-Bg) = 1. Since visits each state (m, k^) at most 
once, the Martin kernel is given by 



K((m, K^), (n, K^)) 






(0,?c )(n,7c )> 



n > m, 



( = 0 otherwise). Thus y^{(x)) e means that 



K{{m, /c^), x) = lim 

n-> oo 



v([k”', <c"(a))]) 
v([/f'"])v([K"(a>)]) 



exists for every (m, k"'), and is a minimal regular function of (m, /c’"). 
By the last theorem, K(-,x) is minimal regular if and only if 
, K’^])K({m, /c™), x) = fioo([K°, . K*"]) for a (unique) e ^y. 
In this ease, we deduce from Proposition 12-23 that 



/r^([/c°, ■ ■ k"*]) 



lim r([/<:°, . . ., k ”'] I [k:"(w)]) 

n-» 00 

v([k°, . . . , /c”*]) 

lim . . . , k”*]) 

n-» 00 

v {[ k ^, . . . , /c-]) 



for all m, k^. This shows that ^-lim^_oo ^ g B^}. 

Using the reference measure v e we have produced a set of measures 
in <oy which has v-measure 1 . 



Theorem 12-28; The elements of S’y are in one-to-one correspondence 
with those of B^. If g S"y corresponds to x e B^, then there is a 
bijection between the probability measures on Bq and the neighbor 
Gibbs fields in given by the equations 



H' = 




jji^dX^(x) 
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and 



X^(E) = fjL 




^-lim 

n-> oo 



,,?c^(co) 

r 



for some xeE 



E Borel in B^. 



Proof: We know from Chapter 10 that is in one-to-one corre- 
spondence with the class of minimal normalized regular functions, 
while Theorem 12-25 gives a bijection between and Sy. Hence 
there is a one-to-one correspondence between B^ and (oy. For 

fjL e apply Theorem 10-41 to p{fi) e and use Lemma 12-24, to get 






J 

J xeBe 



I^([aC°, . . . , K^]) 



dX^^^\x), 



where is harmonic measure for the ^-process. A routine computation 
using the explicit form of the Martin kernel, derived in the proof of the 
previous theorem, shows that 

X^^^\E) = /x(|o> I ^-lim for some x eE 

\ I n-> 00 

= /x||cD I ^-lim for some x eE 

(The p{p) process changes the reference measure to pu, and 
since both random fields are defined in terms of the same characteris- 
tics,) Setting = X^^^\ the theorem follows. 



}) 

})■ 



Corollary 12-29: If puE Sy, then there is an t = (ac°, /c^, . . . ) g such 
that 



/X = ^-lim /x'^". 

n-*co 



Proof: The uniqueness of the integral representation implies that if 
pi^ = pi^ G^y, then A^({x}) = /x({o;: Mim^^oo = ^}) = 1. The 

desired i may therefore be any configuration from a full oj-set with 
respect to /x. 

The entire development of this section has proceeded on the assump- 
tion that ^y is not empty. The problem of determining from the 
potential V just when ^y is not empty turns out to be a difficult one if 
the state space S is countably infinite. In the case where T is the 
integers, we shall have more to say about this later. When S is finite 
on the other hand, it will now be proved that ^y is always non-empty. 
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Theorem 12-30: If 8 is finite, then # 0 . Moreover, the limits 
. . . , /c"]) = lim max /c*"]) 

n-K» 

and 

u”([/c°, . . . , k”*]) = lim min . . . , /c*"]) 

n-^oo 

exist for all m > 0, (/c°, , . K^)e and there is phase multiphcity 
for V if and only if 

/^■^([/c®, . . . , /c"^]) / for some m and (/c°, . . . , fc^). 

(Recall that , k^]) is a certain characteristic which is com- 

pletely and uniquely determined by V.) 

To prove the theorem, we first need the following lemma. 

Lemma 12-31 : For given /x e m > 0 and fixed (ac°, . . . , k'^) g 
abbreviate 

/x+ = max , /c""]), 

7C«€S*^(n) 

= min , /c^]). 

K^eS^(.n) 

Then 

(1) 0 < fjL~ < /x([k:°, . . . , K^]) < fjL^ for each n > m, and 

(2) jLc“ is increasing and is decreasing as ^ oo. 

Proof: (1) jLt“ is a minimum over a finite set of strictly positive 
probabilities, hence strictly positive. When n > m, 

/.([/.o, . . . , K-"]) = 2 • ■ ■ . 

^ ^*n 2 ’ 

k;” 

and an analogous estimate establishes the remaining inequality. 

(2) For any n > m and e K(n + 1), 

. . ., /C”]) = •••,«’”] I 1 



^ f^n I 
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This shows that fjL^ is increasing with n; a similar estimate proves that 
is decreasing. 

Proof of Theorem 12-30: The limits and fi~ are well-defined by 
the monotonicity established in the lemma. We now show that for 
any given configuration (#c°, . . . , on A{m), there is a random field 
fiE such that ^([/c°, . . . , k^]) = . . . , /c^]). Let denote 

the configuration on K{n) for which the minimum value fi' is attained, 
and define measures > 1, on by 

Vn(W) = whenever A <= A{n — 1) 

Mn‘ ^ = 1 

= ^}) = 1 whenever i ^ A(n) 

(The notation [tj was introduced in the proof of Theorem 12-25.) 
These specifications are clearly consistent, so each is well-defined. 
Now by the finiteness of S, we can use a diagonal argument (like the 
one in Proposition 1-63) to choose a subsequence {ja^.} such that 
/fn'([^yi]) all possible configurations on every finite A <=: T. 

By the extension theorem, these cylinder limits give rise to a unique /a 
on Q. Now observe that 

(i([k°, . . /c'"]) = lim • • • . 'f'"]) = lim • • • , /f”*]), 

n'->oo n'-> 00 

the last limit being equal to )a“([/c®, . . . , /c"^]) by definition. To verify 
that we first note that the measures of [t^] are bounded 

away from 0 for n' sufficiently large, since /x^^([t^]) = ja-" ([^^]) is strictly 
positive as soon as /I c: A{n' — 1) and increases with n' by Lemma 
12-31. It follows that ^ is strictly positive on finite cylinders, and that 
the neighbor Gibbs property is inherited from the (i.e., the limit 
may be interchanged with the operations defining the characteristics.) 
This completes the proof that is always non-empty. By an analo- 
gous construction we find a neighbor Gibbs field fis^y with 
/Z([k:°, . . ., /c”"]) = ja-^([K:°, . . ., /c"*]). If/z+ ^ ja" (for some (/c°, . . ., 
then evidently \^y\ > 1. If fjL^ = ja", then Lemma 12-31 shows that 
any fx e ^y is uniquely determined on all events [#c®, . . ., k^], m > 0, 
hence on all [t^], finite A T, This shows that \^y\ = 1, completing 
the proof. 

Unfortunately, the general criterion for phase multiplicity just given 
is often difficult to apply, since /x"*" and fx~ may not be readily comput- 
able. A more detailed theory is available for certain “attractive 
potentials,” examples of which will be mentioned in Section 6. 
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5. Homogeneous Markov fields on the integers 

Throughout this section the time parameter set T will be Z = the 
integers, = {0, 1, . . .} or —[^={...,—1,0}, with the standard 
neighbor structure d (i.e. dn = {n' e T: \n' — n\ — 1}). Our objective 
is to treat, in some detail, the classes of Markov fields on these infinite 
linear lattices. 

First let us consider the “one-sided” cases. We show here the 
previously mentioned fact that the Markov random fields on are 
simply the Markov processes with strictly positive cylinders. 

Proposition 12-32: If T = and [jl e ^y, then is a Markov pro- 
cess. In particular, there are probability measures rr^ and transition 
matrices P^, > 0, such that 

■"n,i = = i], = Pr[a;„ + 1 = j \ = i], 

and TTj^Pn = 77^ + 1 - Similarly, if P = — N and /x g ^y, then {x^} is a 
Markov process on — N , and there are probability measures and 
transition matrices P^, ^ < 0, satisfying the above relations. 

Proof: Without loss of generality, assume that V is normalized. 
It suffices to assume P = and check the Markov property (the proof 
for T = —N being analogous). Fix n > 0, and set A = {1, 2, . . . , tx — 1}. 
For any t g li, define 

Zn(io,in)= 2 _2 

keS'^ VB<^A:BnAi=0 ) 

where is the modification obtained by replacing the values of t on ^ 
with those of k. Now choose a particular l g Q, and let l' be the con- 
figuration obtained by replacing the value i^ at site 0 with a 0. Then, 
using the Markov field property at 0, we have 

Pr[a;o = tp a = i^] ^ ^o(Q ^ /^o(0 

Pr[xo = 0 A 

Writing the characteristics according to Definition 12-20, the last term 
is 



exp{F;o)(t) + F(o,i)(0}- 



2„(0, lexp-^ 



B a A '.B^tA 7 ^ 0 



K) ^ exp^ _ 2 Vsii 



B c: A'.Br^A ^ 0 



^n(^05 ^n) 
^n) 
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after cancellation. Hence 
Pr[xo = io A • • • A = i„] 

= = io A x„ = i„] 

= Zn(io> expj _ 2 ^s(‘) j" = to A ar„ = i„] 



B c A’.BnA 0 



= 2^(0, i^) 1 exp-^ F^O}(0 + _ 2 ^b(0 ^ Pi*K = 0 A = ini 



B <=■ A:BnA 0 

Finally, after further cancellation, we obtain 
Pr[a:„ = | ojq = to A • • • A 

t;j_l)j. y (l) + V (l)l = 0 A Xji = tji] 



^n(^J ^n) 



Pr[xo = 0 A = in-i] 



a conditional probability depending only onn — 1, ^n-i which 

we may set equal to Pn-i,i„_itn- 



Proposition 12-33: Let T = M, and suppose that v g has initial 
measure ttq and transition matrices P^. Then any /x e is Markovian 
with initial measure ttq and transition matrices P^ which satisfy 
tto.io = 7ro.j„A(o,io)anciP;.i„i„^^ = for some solu- 

tion h of the equations 

\n,in~> = 2 -Pn,i„i„ + An + l.i„ + i)- 

in + 

Conversely, any /x arising in this way is in ^y. 



Proof: Apply the construction of the previous section with T = N 
and A{n) = {0, 1, . . . , ti}. Then the correspondence of Theorem 12-25 
is clearly equivalent to the one asserted here, because v and /x are both 
Markovian by Proposition 12-32. 



Of course, an obvious analogue of the last result holds in case T = 
— N . A concrete example of phase multiplicity on a half-line will be 
given later in this section. 

We turn our attention now to the “two-sided” case, T = T, In 
contrast to the one-sided setting, we shall see shortly that there are 
Markov fields on Z which are not Markov processes. Since the neighbor 
structure 0 on Z commutes with translation, it is possible to define a 
class of homogeneous Markov fields. 
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Definition 12-34: A Markov field is homogeneous if 

for all m,uEZ,ie Q, 

where the value of at site t is ^ neighbor potential U is 

homogeneous if 

UimM = and whenever = j, = k. 

Two facts follow immediately from the definitions. First, if is 
homogeneous Markov, then the canonical V of Theorem 12-21 is a 
homogeneous neighbor potential. Second, any neighbor Gibbs field 
with homogeneous neighbor potential U is homogeneous Markov. 
For such a U , if we set Qjj^ = explJt^y -h -f then /x is a Gibbs 
field with potential U if and only if 

seS seS 



whenever im-i = h im =i and im + i = k. On the other hand, any 
strictly positive Q defines consistent local characteristics by means of 
the above equation, and these in turn give rise to a homogeneous 
neighbor Gibbs potential for /x, according to Theorem 12-21. Thus we 
obtain a multiplicative representation for the characteristics in terms 
of Q which is more convenient than the one involving the canonical 
potential V. Let denote the class of Markov fields determined by Q 

in this manner. An immediate requirement for to be non empty is 
< 00 , and since the local characteristics determine all the character- 
istics, it follows from this assumption that < oo for all n, so that 






^im -limQ 



ifn^m 

~(Q 



in - m + 2 



^tfn - lin + 1 



whenever m < n and [m — 1, n + 1] A. (Here [m,n] denotes 
{m, m + 1, . . . , n].) 

We have seen that any homogeneous neighbor Gibbs field is in 
for some Q. But many matrices Q' give rise to the same character- 
istics, just as in the potential representation. By definition, either 
= ^Q' oi* ^ ^Q' = ; say that Q is equivalent to Q' in the former 

case {Q Q' in notation). 



Proposition 12-35: Strictly positive matrices Q' and Q" are equivalent 
if and only if 



Qij 



c^' i,jeS, 

K 



for some c > 0 and strictly positive h = {h^)ueS' 
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Proof: The statements and proofs of Proposition 12-14 and Corollary 
12-15 are easily modified to apply to neighbor potentials on an infinite 
parameter set by replacing T with A. In our case, if U' and U" are 
the homogeneous neighbor potentials such that = 0 for all k 

and = In Qlj, = In QJy, then V and U" determine the same 
Gibbs fields if and only if 

2 (- = 0 for every A = {m}, {m, m + 1}, 

Bc:A<=DtzA 

where — U'^. Setting these equations become 

(1) ^kO + ^Ofc “ 2 Sqo = 0 • 7 • a 

(2) S|y — S^o ~ ^oy + ^00 = ^ 

The combination (2) + J(1 with k = i) with k = j) yields 

(3) (u'ij- - u'lj) -h Wi - Wj- - z = 0, 

where w,, = i(§o/c - S^o), 2 = 800- 

Defining and c = e^, the desired equation in terms of Q' and 

Q" follows. Conversely, if the hypothesis holds then 

QijQ jk _ Qij{^jl^i)Qjk{^kl^j) _ Qij Q]k 
(Q"%h “ c 2 2 Q'isihmMK) ~ 

seS 

SO that Q' and Q" determine the same local characteristics, i.e., Q' ^ Q". 

The remainder of this section will be devoted to an analysis of 
the class of homogeneous Markov fields on Z determined by a strictly 
positive matrix Q. We let comprise the extreme measures in 
It will now be proved, using the Martin boundary arguments of the 
previous section, that these extreme Markov fields on Z are always 
Markov processes. 

Theorem 12 - 36 : If /x e S’q, then {^n}nez ^ Markov process. Thus 
/X is determined by measures and transition matrices P^, neZ, where 

■^n.i = Pr[x„ = i], = Pr[x„^.l = j\x^ = i], 

d^nd TTj^Pj^ = + 

Proof: It suffices to check the Markov property. For n > 0, let 
A(n) = { — n, —n -h l,...,n}. By Corollary 12-29, if [i e S'q, then 
jjL = ^-lim^_oo for some (ac°, . . .) g Q. This implies that there are 

states kj^eS for neZ such that 

Pr[X( = it A + i A • • • A = ij 

= lim Pr[X( = A — h + i A • • • A | 

n-*’ 00 

^-n ~ ^-n ^ ~ ^n] 
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whenever I < m. Therefore 

PrK = *m 1 a:, = i, A + i + i A • • • A x„_i = i„_i] 

= lim Pr[a;„ = \ A a;, = i, A x, + i = + i A 

n-+ 00 

• • • A X„_1 = A = fcj. 

By the Markov field property, for all n suflSciently large the right hand 
side becomes 

lim Pr[a:„ = j x_„ = k_„ A a:„_i = A a;„ = *:„] 

n-* 00 

Pr[a;^-i = ^m-i A a:„ = | x_„ = A a;„ = fcj 

„-.oo Pr[x„_i = I x_„ = A x„ = *„] 

= Pr[x„ = I X^_i = ifn-i\ = 

Next we present a useful representation theorem for the extreme 
homogeneous Gibbs states with matrix Q. 

Theorem 12-37 : If ^ g then there are strictly positive functions 
and (n e I, i e S) such that 

( 1 ) ff(n + l,in + l) = 2 + 

ines 

( 2 ) h(n,in)= 2 Cinin + An + l,i„ + i)> 

in + le-Sf 

and 

(^) 2 ^Cn.i„An,in) = 

ineS 

and such that the measures and transition matrices for {x^} are 
given by 

'^nj ~ 9(n,j)^(nJ) ^^d P-nJk ~ Qik^(n + l,k)l^(n,j)' 

Moreover, there are constants c', c" > 0 and S (n eZ) such that 

9 (rn.im) = {Q'^~'')kniJ{Q~'')knO m < 0 ,i^sS 

n-* - 00 

= c" m > 0,i„eS 



Proof; Since {Xj^} is a Markov process, 



;ttT/ X QjkQ 






jfcVfcO 

(Q%0 



p p 

-*• njk-*- n + l,kO 
i^n^n + l)jO 
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QkoQoO 


_ Pn + l,koP n + 2,00 






{Q\o 


(Pn + lPn + 2)kO 






^ = k, 


K + 2 — K + 3 — fi* 


Let 


^inj) 






whenever K = h K + i = ^n + 2 = ^n + 3 = ^- Let = 

(G^)W(^n^n + i);o. = Qoo/^n + 1 , 00 - Then dividing the first equation 

by the second and rearranging terms, 

P njk __ ^ Kn + l,k) 

Qjk ^ 71+ 1 ^(nj) 

Choose c^, 71 G Z, so that Cq = 1 and = c^. Now define \n,j) = 

^nKj) to get 

Pnjk — Qjk\n + l,k)l\nJ) 

as desired. Equation (2) follows immediately from the fact that is 
a transition matrix. If we define g^n,]) = '^njIKnj)^ then (3) holds 
because tt-^ is a probability measure, while the equation tt^P^ = '^n + i 
implies (1). It therefore remains only to derive the representation of g 
and h as ratio limits of powers of Q. To this end, choose A{n) and 
as in the proof of the previous theorem. Then for m > 0, 

Pr[a^m = 1 ^0 = 0] 

= lim Pr[a;„ - \ x_„ = a Xq = 0 A x„ = 

n-* 00 

= lim Pr[x„ = I a;o = 0 A x„ = fe„], 

n-* 00 

the last since {x^] is a Markov field. In terms of g, h and Q we have 

9iO,0){Q^)oim\rri4rn) _ ^)imkn\n,kn) 

9{0,0)\o,0) 7i»oo g(0,0)(Q^)okm\n,kn) 



9{0,0)\o,0) 7i»oo 9(0,0)(Q^)okm\n,kn) 

SO that 

KmArn) “ ^( 0 , 0 ) hm {Q^ ^)imkJ{Q^)okn- 

n -* 00 

An analogous computation yields the result for g when 77^ < 0. 

With the aid of the above representation, cases where |^q| = 0, 1 
and 00 will now be discussed. 



Example 12-38: Let S = I, and consider any matrix Q of the form 

Qti = ii-i > 0, i,j e Z, where ^ = 1- 

iez 

Suppose that fx e with g and h the functions of Theorem 12-37. 
Equations (1) and (2) of that theorem say that g(^-m,i) and ^ ^ 
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are space-time harmonic for sums of independent random variables. 
By analogy to Example 6 of Section 10-13, one can show that the 
extreme such functions are of the form for some c > 0, ^ > 0 
{q^ > 0 excludes degenerate solutions). Thus, 

^00 ^00 

9^(o.i)^(o,f) = \sH^^dv(s)d^{t) 

Jo Jo 

for some measures v and /x on (0, oo). It follows that 

2 ^(0.i)^(0,i) = f f + 2 2 

iez Jo Jo L i = i i = l J 

= CX), 

since one of the two infinite sums must diverge for any given s and t. 
This contradicts equation (3) of Theorem 12-37, so S"q is empty, 
is therefore empty by Theorem 12-28. 

If Q > 0 is an ergodic transition matrix, then there is a unique 
probability measure a > 0 such that aQ = a. In this case clearly 
contains the stationary process with 

7T„ f = cci and Pjijj = Qij for all neZ. 

Thus, whenever Q' ^ Q for some ergodic Q, then \^q'\ 0. Our next 

goal is to show that when S is finite contains exactly one Markov 
field for any Q' > 0, and that this field is a stationary Markov chain on 
Z. The first step is contained in a lemma. 

Lemma 12-39 : If is finite, then any Q' > 0 is equivalent to a strictly 
positive transition matrix Q (with Q\ = 1). 

Proof: We show that there is a vector h > 0 such that Q'h = ch for 
some constant c > 0. Then, defining 

n.-%h 



it follows that Q is a transition matrix, while Q' Q hj Proposition 
12-35. To get A, let .5^ = {^ = h > 0 and = |^|}> 

define c = sup{c:Q'A > ch for some he^]. Easy estimates prove 
that 0 < < c < |)8| max^ yQ-y < oo. By definition of c and 

Proposition 1-63, there are constants and elements of 6^, 
n = 0, 1, . . ., such that -> c, and lim,i_^oo h^^^ = h 

for some h e 6^. This implies that Q'h > ch. Now if {Q'h)^ > thj for 
some j, then Q'{Q'h) > ^Q'h) so Q\Q'h) > (c + €){Q'h) for small e. 
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But this contradicts the definition of c once we normalize Q'h. Hence 
ch = Q'h > 0, which shows that A > 0. 

Theorem 12-40: If 8 is finite, then \^q'\ = 1. consists of the 
Markov chain with transition matrix Q and stationary measure a, where 
Q is defined as in Lemma 12-39 and a is the regular probability measure 
for Q. 



Proof: Suppose /x g and let k^^, nel., be the states in the ratio 
limit representation of Theorem 12-37 for the functions 
Since 8 is finite, there is some state j" e 8 and an infinite sequence 
-> 00 such that k^^^ = j"- Hence 



lim (Q"” 



— C" 



lim (Q^~) 



or 



= c — = c , 

CC^ff 



by the convergence theorem for noncyclic ergodic chains. Thus h is 
constant. Similarly, there is some j' and sequence n'— > — oo with 
K' = j\ whereby 



lim 



J'im 



9(m,irn) ~ ^ 



n'- 



lim {Q ^ «o 



^O^im 



It follows that TT^j = aj and Pnjk = Qjk- other words, /x is uniquely 
determined as the process described in the statement of the theorem. 
For any strictly positive finite matrix Q' with equivalent transition 
matrix Q, we therefore have \^q'\ = \^q\ — 1. Theorem 12-28 now 
implies \^q\ = 1. 



Next we present a concrete example of a matrix Q with phase multi- 
plicity, for which all the elements of can be exhibited explicitly. 



Example 12-41; Let 8 = {0, 1, . . .}, and consider the strictly positive 
matrix Q given by 



Qi 



i Ay 

= 2 

k = 0 




[e-^W-V(j - *)!] 



(i A j = min{i, j}). 



Q may be thought of as describing transition from i to j particles in a 
population. First particles disappear independently with probability 
1/2, and then an independent Poisson distributed population of mean 
1/2 is added to those which remain. This interpretation makes it clear 
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that Ql = 1. We remark next that Q is ergodic, with regular measure 
ai = e~^ji\, i E S. In fact, Q is a-reversible; 




(l).H-y-fc 






since the terms in square brackets agree for every i, j and k. Thus 
contains the stationary Markov chain with ^ and = Qa 
every n eZ. In order to determine the other elements of ^q, we first 
compute the powers of Q. For this purpose it is convenient to introduce 
the generating functions yf(5) = 2f=o (kl 1)» We 

claim : 



= [1 - (ir(l - 5)rexp{-[l - (1)-](1 - s)}. 

When n = 0 both sides are s^; the result follows by induction from the 
following considerations. If we start with i particles at time 0, and 
Sji denotes the number at time + 1, then = X 2f=i 

where X and the are independent, X Poisson with mean 1/2, the 
taking on values 0 and 1 each with probability 1/2. Since X has 
generating function ^,nd each has generating function 

(1 + s)/2, it follows from the formula for the generating function of a 
random sum that the generating function of + equals 

e-d"^)/2yf((l + s)/2) for n > 0. The desired formula for y?( 5 ) satisfies 
this recursion relation. Hence the coefficient of in the power 

series expansion of y?(5), is given by 








[e-d-™(i - ary-vij - k)\i 



If /X G (oq, then according to Theorem 12-37, the limits 

= c' lim iES, 

n-* 00 



exist and are strictly positive, for some c' > 0 and fixed sequence 
n eZ. We show next that these limits exist if and only if lim^ k_j2'^ 
= 6 for some 6 ( > 0). Under this assumption 

(1 _ (i)n + m)/c.„ _^g-0(l/2r 
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and 



Hence 







lim 




g-0(l/2r 




l)\] 



^ g - (0(1/2)'" + 1) 



2 (;)[0(i/2r]Vi! 



We can take this last quantity to be g^m,i) choosing c' = e On 

the other hand, if lim„_^oo does not exist then we must have 

^ 00 as 71 00 , for otherwise the defining ratios would converge 

to distinct limits along different subsequences. But the ratio 
(Q^'^^)/c_„f/(G”)/c_nO> for example, is a sum of positive terms including 



-g-(l-(l/2)" + i)- 
g-(l-(l/2)") 



n _ 

. 1 - (ir . 



[1 - 






Ci > 0, 



and this last expression tends to oo as ti oo if -> oo. We have 

therefore shown that the limits defining exist if and only if 

k_J2^ 0 G [0, oo) as 71 -> 00 , in which case ^ function of i may 

be taken to be Poisson with mean 6{\Y^ + 1. Since Q is a-reversible, 
cCi{Q^)ij = aj{Q^)ji, and hence the function \m,i) of Theorem 12-37 can 
be computed as 



h 



(m,i) 



c" lim 

n-* 00 



(Q’^)okn 



= c lim — 

n-» 00 «i(Q )icn0 

= c"e-<’'2“ + i)(^2’" + 1)‘ 



oo); otherwise the limit does not exist. Condition (3) 
of Theorem 12-37 now dictates c'" = In summary, we have 

proved that if ^jl e S' q, then fx is one of the Markov process measures 
fjbQjjy d, r] g[ 0, co). Here 6 determines g, rj determines h, and then g and 
h determine according to Theorem 12-37. Finally, it remains to 
check that all the are extreme, so that in fact d,r]G [0, oo)}. 

One first verifies, as in Example 6 of Section 10-13, that the topology 
of the Martin boundary B associated with is the usual topology on 
IR^ = {(0, 7 ]): 6 > 0, 7 ] > 0}. Then by Theorem 12-28, 



/•CO ^00 

H'dii ~ I I /^0r?^^(^j V) 

Jo Jo 
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for a unique probability measure A on such that A({(0, rj): e (^q}) 

= 1. Evaluating both sides on {co | = i, o>„ + i = j}, we derive the 

equation 

e-arf|g-Wl/2)« + l)(0(J)n ^ l)7i!}{c-(^2n + i + D(-2n + l + 

= f f ^ 

Jo Jo 

X + + l iy/j\}dX(e,'q). 

Multiply both sides by u^v^' {u, v e U), sum over i,j e 8, and interchange 
summation and integration, to obtain 

e ~ " ®<i/2)«(i - w) j - n2" + Hi - v) j 

Jo Jo 

If we make the change of variables: x = (I — u)l2^, y = 2^ + ^(l — v), 
then for a: > 0 and y > 0 the right hand side is the double Laplace 
transform of the measure e~^MX{6, rj). Since the equation is satisfied 
when A concentrates at (6,7]), the uniqueness theorem for Laplace 
transforms implies that A must be this measure. We conclude that 
fjLQjiE S' Q, as desired. 

Whenever Q is a transition matrix and ixeSq admits a representation 
with = 1, then and P^ = Q. A family of probability 

distributions n el, such that tt^Q = + i is called an entrance law 

for Q. For example, the Poisson distributions with mean 6(\)'^ + 1 
for fixed 0 g [0, oo) constitute an entrance law for the matrix Q of the 
last example. The case 9 = 0 yields the stationary Markov process 
with regular measure a; when 6 > 0 the process {x^} with measure 
“comes down from infinity ” in the sense that lim^_^ _ oo Pr[x,^ = ^] = 0 
for any i eS. A more surprising type of entrance law is described in 
Problem 8 at the end of the chapter. 

The matrix Q of Example 12-41 may also be used to illustrate phase 
multiplicity on the non -negative half-line. Namely, for g [0, oo), let 
Mn,i) = + 1)‘, and define Tig,; = aMQ,i-,,Pljk==Q}kK + iMMnjy 

If we set ttq = a, P^ = Q, ttq = Trg and = P^ in Proposition 12-33, 
then the hypotheses there are clearly satisfied. Thus we have con- 
structed a large family of one-sided Markov fields with the same 
characteristics as the Markov chain with transition matrix Q and 
initial distribution a. 

As a final application of Example 12-41, we exhibit an element of 
which is not a Markov process. Consider the field given by the 
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convex combination fi = Clearly fjuE ^q, and to see that 

fji is not the measure for a Markov process it suffices to check that 

'Pt[x2 = 0 I = 0 a = 0] / Pr[a:2 = 0 | = 0]. 

The 7T„ and P„ for /tn satisfy tto.q = e~‘^, Pq,oo = e~^Qoo 

and Pi oo = ^~^Qoo- Thus 



Pr[xa = 0 I Xo 



0 A = 0] 



^QooQoo) + ^Qoo^ ^Qoo) 



while 



1 + 

1 + e~^ 



Q 



00> 



Pr[x2 = 0 I = 0] 



\{e-^Qco) + 



1 + 

TT~P^ 



QoO' 



Hence has the two-sided Markov property, but not the one-sided 
Markov property. 

We conclude this section by mentioning without proof the deepest 
result to date in the theory of denumerable Markov fields on Z. When 
T has a group structure it is natural to consider the class consisting 
of all those which are invariant under translations. The 

following theorem completely determines the possibilities for the 
translation invariant Markov fields on T = Z with characteristics deter- 
mined by the strictly positive matrix Q' , 



Theorem 12-42: li pe then Q' 7:^ Q for some strictly positive Q 
such that Q1<1. IfQl = l and Q is ergodic, then yS%\ = 1. In 
this case the unique member of is the stationary Markov chain on Z 
with transition matrix Q and = the Q-regular measure a. In all 
other cases = 0 . 

6. Examples of phase multiplicity in higher dimensions 

When T = Z^ for d > 2, the conclusion of Theorem 12-40 ceases to 
hold. In other words, there are instances of phase multiplicity for 
homogeneous potentials U with S finite. This phenomenon is un- 
doubtedly the most important in the theory of random fields, but an 
adequate treatment is far beyond the scope and purpose of this chapter. 
Instead, we briefiy discuss two examples. Suggestions for further 
reading are included in the Additional Notes. 
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Example 12-43: Tree processes. = (0, 1, . . . , iV^} and T is a count- 
able tree (i.e., T is endowed with a neighbor structure d which defines a 
connected graph with no loops). Let P be a strictly positive iV^-state 
transition matrix with regular measure a, and assume that P is a- 
reversible. The random field {x^ on is called a tree process for 
(a, P) if fi is defined by the following three properties: 

(i) Fr[Xf. — i] = for all i eS,t e T\ 

(ii) Pr[a;t„ = a A • • • A Xf, = i,] = 

whenever . . . , is a finite path in the tree T and i^, . . . , 

iieS; 

(iii) Pr[a:fQ = io ^ ^ — h \ — h ^r all r e A] 

= = io A • • • A Xt, = i; I xj„ = io\ 

for any finite A T and path , ti} ^ T which intersect 

only at site tQ, and any iQ, , . . , ii e S. 

For given a and Q, conditions (i)-(iii) determine a well-defined and 
unique Markov field. According to (i) and (ii) the process behaves like 
a reversible Markov chain along paths (reversibility ensures that 
cylinder probabilities are independent of the direction we travel along a 
path in (ii)), while paths ‘‘patch together” because of condition (iii). 
As a special case, suppose that S = {0, 1} and T is the tree with three 
neighbors for every site (sometimes called the 3-Bethe lattice). Con- 
sider the tree processes with measures [i' and /x" induced by (a', P') and 
(a", P") respectively, where 




It is not hard to verify that both fields have the same local character- 
istics, so there is phase transition for the potential V corresponding to 
these characteristics. For instance. 



ix'{{xa = 0} I = 0 for all t e da}) = 



imr 
mir + 



32 

33 > 



= 0} I = 0 for all t e da]) = 



^)(i 



im? + (i)(f)® 



32 

33 - 



In this setting, ifP=( ^ then a — ( — - — , — — — | and 

\ q I - qj \p + q P + qJ 



P is always a-reversible. The {a, P)-process is called attractive when 
P + q ^ C Roughly, attractiveness means that a 1 at site t increases 
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the likelihood of I’s near t (and similarly for O’s). More precisely this 
implies that in Theorem 12-30 = 1}) is maximized when k'^ 

consists of “ all Ts ” on K(n). Using this fact it is possible to show that 
in the attractive case there is phase multiplicity for the potential corre- 
sponding to (a, P) if and only if {p — q)^ - 2(p + g) + 1 > 0 and 
(p, q) (J, J). The (a, P)-process is repulsive when p + q > I, and 
in this case one can prove by entirely different methods that phase 
multiplicity occurs if and only if p H- g > f . 

Random fields on countable trees T have the advantage that most 
physically meaningful quantities can be computed explicitly; the fact 
that T has no loops enables one to use inductive methods. The natural 
setting for statistical mechanics is P = however, and in this case 
the theory is immensely more difficult. We summarize some of the 
leading results for the simplest Markov fields on the two-dimensional 
integer lattice in our last example. 

Example 12-44: Two-dimensional Ising model. S = {0, 1}, T = Z^. 
F is a normalized potential of the form : 

^{ a )(0 = '^0 ^{ a , b }(^) — '^1 

whenever \a — b\ = 1 and = 1, with F^(t) = 0 in all other 

cases. F is attractive if > 0, repulsive otherwise; the intuitive 
interpretation is the same as in the previous example. When F is 
attractive there is phase multiplicity if and only if Vq -h 2vi = 0 and 
Vi > 2 ln(V2 + 1). For repulsive F there is phase multiplicity in an 
open neighborhood of the line segment {{Vq,Vi):Vq -h 2v^ = 0 and 
v-^ < K], with K sufficiently negative. Similar results hold in higher 
dimensions, though less is known. 

7. Problems 

1. Show that if = {0, 1}, T is a finite subset of Z^, and F is a normalized 
neighbor potential, then the energy Hy of F may be expressed as 

Hv(^) “ i 2 2 ^ == {h} ^ 

aeT bed 

for some v = {Vat, g R, |a — 6| ^ 1} satisfying v^d = 

2. Give an example of a finite Gibbs field which cannot be represented in 
terms of a pair potential. 

3. Let 5 be a given neighbor system. The iC-neighbor set d^a [K — 1,2,...) 
oi ae T is defined recursively by d'^a = da, d^a = a, and for K > I, 
d^-^^a = d(d^a), d^'^'^a = {d^^'^a) U {d^a). Thus a Z-neighbor ^ of a is a 
site which can be reached from a in X steps to neighboring sites, and no 
fewer. A random field {xj is called a iC-Markov field (with respect to d) 
if fjL^ = whenever d^a A T, A finite. A potential U is called 
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a ^-neighbor potential if = 0 whenever A contains two sites which 
are not iiT-neighbors. Show that {x^ is JK^-Markov if and only if the 
canonical potential V for {x^ is ii-neighbor. What does this say when 
jfiT = 0 ? [Hint: Define a new neighbor system.] 

4. Suppose that there is a metric d defined on T, and say that {x^} is Z- 

Markov (L > 0) if whenever B{a, L) = {t e T: d(a, t) < L] 

A T. What property for the canonical potential V is equivalent 
to the L-Markov property for {a;J? State and prove a theorem to 
justify your assertion. Describe the A/2-Markov fields on Z^. 

5. Show that if has any neighbor potential U, then is a Markov field. 
Give an example of a Markov field with potential U, such that U is not 
a neighbor potential. [Hint: For the first part, look carefully at the 
proof of Theorem 12-16.] 

6. Let T = Z. Prove that = 0 if 

lim sup = 0 for some j e S. 

n-^<x>UkeS (Q hic 

[Hint: Show that this condition forces Pr[a;o = ^‘] = 0.] 

7. Give an example of a Markov process {a;^}nez which is not extreme in its 
class of Markov fields. [Hint: Use the matrix Q of Example 12-41.] 

8. Let = (0, 1, . . .}, T = Z. Define inductively by «o = 

= (o£i_i//3) + for ^ > 1. Let 

^ r«y i = 0JeS 

' + H-iy](«Ai) i>l,jeS 

Finally, put 



( ai + ~ ^i) n (afc/3afc + i) n < 0 

fc=-n-l 

n > 0 

Show that a and all of the 7 t„ are strictly positive probability vectors on 
S, and that Q is a strictly positive transition matrix with Q1 = 1. 
Finally, prove that aQ = a, tt^Q = 7Tn + i, and a for n < 0. Thus 

Q has an entrance law which agrees with the stationary one from time 
0 on, but not before time 0. 

9. Let V be any neighbor potential on Z. Show that if e (^y, then {Xy^ 
is a Markov process. 

10. Suppose g and h satisfy (l)-(3) of Theorem 12-37 for some Q > 0. 
Show that TTyy j and Pnjk prescribed in that theorem give rise to a well- 
defined field {xy,}. Is \xy} in ? Is it in ? 




NOTES 



Chapter 3: 

Stochastic processes with the martingale property were first studied by 
Levy [1937]. Levy considered, in Sections 67 to 70, partial sums of sequences 
{ fn} such that 

I /o A • • • A A] = 0. 

These are a natural generalization of sums of independent random variables 
with mean 0. He proved theorems such as a central limit theorem suggested 
by comparison with sums of independent random variables. ViUe [1939] 
recognized the importance of studying processes representing a fair game and 
for which system theorems should hold. He called these processes martin- 
gales. Although he did not prove any convergence theorems, he did prove 
the inequality given in Problem 7. From this he was able to conclude that 
non-negative martingales had finite lim sup with probability one. He made 
application of this to the study of sample paths of coin tossing. In par- 
ticular, he proved one half of the law of the iterated logarithm. The basic 
convergence theorem. Theorem 3-12, for martingales was proved by Doob 
[1940]. In his book on stochastic processes, Doob [1953] introduced sub- 
martingales (called semi-martingales in that book) and made a systematic 
study of the system theorems and convergence theorems for these processes. 
The proof of Proposition 3-11 for martingales is due to Doob [1940]. The 
proof given here and the extension to submartingales is due to Snell [1952]. 
Additional applications of martingale theory to Markov chains may be found 
in Lamperti [1960a] and [1963a]. 

Chapter 4: 

Markov chains with a finite number of states were introduced by Markov 
[1907]. Kolmogorov [1936] considered the case of a denumerable number 
of states. Important contributions in the foundations of Markov chains 
were made by Doeblin [1938]. There are a number of books devoted to the 
study of finite Markov chains . Among these are Frechet [ 1 938] , Romanovskii 
[1949], Kemeny and Snell [1960], Lahres [1964], and Gorden [1965]. The 
theory of denumerable Markov chains is the subject of a book by Chung 
[I960]. 

Finite random walks have been analyzed in some detail in Kemeny and 
Snell [1960], Chapter 7. See also Kac [1947a]. The books by Spitzer 
[1964] and Kemperman [1961] give detailed studies of Markov ehain problems 
applied to sums of independent random variables. The class of random 
walks discussed in Example 8 was introduced by Karlin and McGregor 
[1959] who made an extensive study of these processes. There is a large 
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literature on branching processes. References to this literature as well as an 
account of the theory of these processes may be found in a book of Harris 
[1963]. The process called the basic example in this book is often referred 
to as a “renewal process.” 

The recognition of the need for and the importance of system theorems is 
due to Doob. See Doob [1953], Chapter VII. The strong Markov property 
which holds for any denumerable Markov chain does not hold for general 
Markov processes w^here time is allowed to be continuous and the state space 
is the real line. For a discussion of this problem in the more general setting, 
see Blumenthal [1957]. 

A discussion of system theorems and a rather systematic use of these 
theorems in Markov chain theory may also be found in Chung [I960]. 

Chapter 5: 

Kemeny and SneU [1960], Chapter III, showed that the fundamental 
matrix N could be used to obtain moments of many descriptive quantities 
for finite absorbing chains. The extension of this use of N to denumerable 
chains was made in Kemeny and Snell [1961b]. Theorem 5-10 is the analog 
of the Riesz Decomposition Theorem for superregular functions. A 
systematic discussion of results of this type which exploit the analogy 
between superregular functions for a Markov chain and classical super- 
harmonic functions may be found in Feller [1956] and Doob [1959]. The 
proof of Proposition 5-20 is due to D3mkin and Malyutov [1961]. 

Proposition 5-22 is true even if we drop the hypothesis of finitely many 
A:- values. This theorem is due to Chung and Erdos [1951], and a simplified 
proof of this result was given by Chung and Ornstein [1962]. 

Chapter 6: 

Theorem _6- 9 is due to Herman [1954]. He proved existence by showing 
that aj = is a regular measure. His proof of uniqueness is less elemen- 
tary than ours and uses the Doeblin ratio theorem applied to the chain 
reversed by a. This ratio theorem proved by Doeblin [1938] states for 
recurrent chains that lim^_oo (N\fjN]J\^) exists and is independent of i and k. 
Derman also used the identification of this limit as 

Proposition 6-24 is due to Kac [1947b]. 

Doeblin [1938] proved Proposition 6-32 and then applied limit theorems 
for sums of independent random variables to obtain limit theorems for 
Markov chains. For details of this technique and resulting limit theorems, 
see Chung [1960], pp. 75-106. The converse to Proposition 6-32 is due to 
Yosida and Kakutani [1940]. 

The fact that lim,^_^oc exists for a noncyclic recurrent chain is due to 
Kolmogorov [1936], [1937]. The extension of this result given in Theorem 
6-38 is due to Orey [1962], and the first proof (including Lemmas 6-36 and 
6-37) is a somewhat simplified version of his proof. The proof using the 
Renewal Theorem may be found in Feller [1957]. Theorem 6-43 is new. 

Chapter 7 : 

The details of the connection between Brownian motion and classical 
potential theory are discussed by Knapp [1965]. The recognition of the 
importance of identifying these two theories started with Kakutuni [1944]. 
Doob [1954] made significant extensions of the results of Kakutani by further 
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identifying martingale and submartingale theory with the theory of harmonic 
and super harmonic functions. Important contributions again exploiting 
connections between Brownian motion and classical potential theory were 
made by Kac [1951]. The next major contribution was made by Himt 
[1957], [1958]. Hunt showed that one could develop a potential theory for 
essentially the most general Markov process. He considers continuous 
time and abstract state space. He showed conversely that under rather 
minimal requirements for a potential theory one can construct a Markov 
process associated with this theory. His work related to the potential 
theory that goes with transient processes. 

Although many of Hunt’s results go over easily to the Markov chain case, 
even for transient chains new problems arise, and a whole new theory must 
be developed for the recurrent case. These extensions were made by 
Kemeny and Snell [1961b] for general Markov chains and by Spitzer [1962] 
for the important class of Markov chains which arise from sums of lattice- 
valued independent random variables. 

The fact that the symmetric random walk in one and two dimensions is 
recurrent, whereas in dimension three or greater it is transient was first 
proved by Polya [1921]. The proof of Proposition 7-10 was supplied by 
Lamperti. 

Chapter 8: 

The notion of )^-regular function was introduced into the study of potential 
theory by Brelot [1956] and the corresponding idea of a function regular in 
the A-process for chains was discussed in Feller [1956] and Doob [1959]. 

A discussion of equilibrium potential, equilibrium charge, and capacity as 
they arise in electrostatics and Newtonian potential theory may be found in 
the book of Kellogg [1929]. A somewhat more modern approach may be 
found in Brelot [1959]. In the classical theory the Green’s function which 
plays the role of the matrix N is always symmetric. The fact that there is an 
interesting potential theory even for nons 3 onmetric operators in probability 
was first shown by Hunt [1957], [1958]. 

The results of Sections 1 and 2 of this chapter were for the most part in 
Doob [1959]. Those of Sections 3 and 4 are specializations to the Markov 
chain case of results obtained by Hunt [1957], [1958] for more general 
Markov processes. 

Choquet and Deny [1956-57] investigated the problem of the relation 
between the various potential principles for the case of potentials of the form 
g = Gf, where G is an arbitrary non-negative finite matrix. If G has an 
inverse they proved that the Principles of Balayage and Domination are 
equivalent and that each implies the Principle of Lower Envelope. Here 
every non- negative function / is a charge. They showed further that if G 
satisfies the Principle of Lower Envelope, then there is a unique permutation 
of the columns of G such that the resulting matrix satisfies the Principle of 
Balayage. Also they showed that G satisfies the Principle of Balayage if 
and only if it is of the form G = A 2?=o where A is a diagonal matrix 
with strictly positive diagonal entries and S is a, non-negative matrix. Thus 
the most general operator here is only slightly more general than the class of 
all matrices of the form N = (1 — where Q is a finite transient chain. 

Some investigation of this problem for denumerable matrices was made by 
Kemeny and Snell [1961b]. 
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The notion of energy seems to be significant only in the case of reversible 
chains and even here it does not have the nice probabilistic interpretations 
that the other potential theory concepts have. 

The results of Section 8 are taken from Kemeny and Snell [1961b]. The 
idea of developing a potential theory for supermartingales was suggested by 
Doob [1961] where he also indicated the proof of Proposition 8-79. 

Chapter 9: 

The potential theory for recurrent chains discussed in this chapter was 
introduced by Kemeny and Snell [1961b]. 

The existence of the limit in Theorem 9-4 was first proved by Doeblin 
[1938]. The identification of the limit was made by Chung [1950]. The 
present proof is from Kemeny [1962]. 

Theorem 9-7 was proved under a mild assumption in Kemeny and Snell 
[1961b]. The case i = j was proved in general by a method due to Chung in 
Chung [1961] and Kemeny and Snell [1961c]. The general case was proved 
by Kemeny [1963]. 

The fact that all ergodic chains are normal follows from Theorem 4, 
Chapter 1, Section 11 of Chung [1961]. The remaining results in the first 
three sections are taken primarily from Kemeny and Snell [1961b]. 

Proposition 9-65 was first proved by Lamperti [1960b]. Results of the 
form of Propositions 9-67 and 9-68 may be found in Chung [1960] Chapter 1, 
Section 11. 

The notion of strong ergodic chains introduced here is new. The matrix Z 
was introduced in Kemeny and Snell [I960]. It was shown in this book that 
the matrix Z for finite ergodic chains could be used to express the moments 
of many interesting descriptive quantities and hence played for recurrent 
chains a role similar to the matrix N for finite absorbing chains. 

All sums of independent random variables processes which form aperiodic 
recurrent Markov chains are normal. This was proved by Kemeny and Snell 
[1961a] for the case of finite variance and in general by Spitzer [1962]. 

The operator K was introduced by Kemeny and SneU [1963b]. Most of 
the results of Sections 8 and 9 are taken from this paper. 

The method of associating denumerable chains with electric circuits 
discussed in Section 10 was carried out by Nash-Williams [1959] under 
slightly more restrictions on the chain than we impose. He proved also 
Lemma 9-129. 

Chapter 10: 

The Martin boundary for Markov chains was introduced independently by 
Doob [1959] and Watanabe [1960a]. Doob and Watanabe used the methods 
which were developed in the study of the classical Martin boundary relevant 
to Newtonian potentials. Details of this approach may be found in Brelot 
[1956] and Doob [1957] or Watanabe [1960a]. 

Hunt [1960] gave a new and more probabilistic treatment of Martin 
boundary theory for Markov chains and completed the work of Doob and 
Watanabe in several ways. In particular, he introduced a new class of 
processes called approximate P- chains. These are slightly more general 
than the processes we have called extended chains. Our treatment of the 
Martin boundary is for the most part a rewriting of Hunt’s paper with more 
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detail supplied, except that we have used a slightly different definition of 
boundary than that listed by Hunt. The difference is as follows. Doob 
introduced a metric on the state space, the one we have used if tt assigns all 
its weight to one point, and completed the state space in terms of this 
metric. A point was called a boundary point by Doob in the completed 
space if it was a limit point of the original states. It is possible for one of 
the original states to be such a limit point. To avoid this peculiarity. Hunt 
modified Doob’s metric shghtly to make such a point into a new point. 
Since these new points are always nonminimal points and appear to play no 
essential role in the theory, we have followed Doob’s metric. However, we 
have chosen to call the boundary simply the new points added by the 
completion. The observation that ttN > 0 is the only condition on tt that 
is needed appears in Orey [1964]. 

One can also use G. Choquet’s theory of convex cones to develop Martin 
boundary theory. This approach has been carried out by Neveu [1964]. 
See also Hennequin and Tortrat [1965]. 

Brelot [1956] showed that the Martin boundary was ideally suited to the 
study of the first boundary problem, or the Dirichlet problem. His approach 
was to generalize the method developed by Perron and Wiener (see Kellogg 
[1929]) for regions in Euclidean space. 

The probabilistic approach to the first boundary problem was first sug- 
gested by Kakutani [1945] and done more generally using the Martin bound- 
ary by Doob [1958]. The method presented in this book is the probabilistic 
approach of Kakutani and Doob. 

The discussion of fine boundary limits follows that of Doob [1957], who 
considered these problems for superharmonic functions using Brownian 
motion theory. 

The Martin boundary has now been worked out for several important 
classes of Markov chains. In particular, Doob, Snell and Williamson [1960] 
have worked out the boundary for general sums of independent random 
variables. Related results may be found in Dynkin and Malyutov [1961]. 
There are close connections between classical moment problems and the 
Martin boundary for certain of these processes. A discussion of this point 
may be found in Watanabe [1960b]. Lamperti and Snell [1963] discussed 
the Martin boundary for the class of random walks introduced by Karhn and 
McGregor [ 1 959] . This discussion was generalized by Kemeny [forthcoming] . 
Finally Blackwell and Kendall [1964] have given a discussion of the Martin 
boundary for the Polya urn scheme. 

The result in Example 3 that the only positive regular functions for the 
symmetric random wafii in three dimensions are the constants was proved by 
Murdoch [1954] by other methods. Murdoch obtained a better estimate of 
Aoy than that given here. The short proof of the estimate for Nq^ that we 
give was supphed by E. Stein. 

The results in Problems 30 to 34 were discovered by Harris [1957] and 
Veech [1963]. 

A point X of / S '* is regular for the Dirichlet problem if for each continuous 
function / > 0 on /S'* the superregular function h with / as boundary values 
has hmy_^ h(j) = f(x). An equivalent condition on x is that for each open 
neighborhood U of x, limy_^ g /8* — U] = 0. Knapp [1966] showed 
that the set of regular points is a Borel set and gave an example of a chain P 
with P1 = 1 for which the set of regular points was empty. 
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Chapter 11 : 

The main results of this chapter were presented in a paper by Kemeny 
and Snell [1963a]. As noted in this paper, these authors are indebted to 
W. A. Veech for the important Lemma 11-11. The recurrent boundary was 
introduced independently by Orey [1964]. 




ADDITIONAL NOTES 



The discussion below deals with some of the developments in the theory 
since the publication of the first edition. Some papers that predate that 
publication are mentioned to put matters in context. Citations point to the 
Additional References except when the bracketed date is followed by “R.” 
The latter citations point to the References section. 

Chapter 12: 

The foundations for the theory of Markov fields and random fields in 
general were developed by Dobruschin (e.g., [1968]) in a series of papers. 
Our treatment of finite random fields and the equivalence theorem for 
Markov and neighbor Gibbs fields, as presented in Sections 2 and 3, is based 
on Griffeath [1973]. K. L. Chung and D. Dawson made helpful improve- 
ments in the presentation. Theorem 12-16 is due in essence to Averintsev 
[1970], though he considered only the case T = Z^. A series of papers, 
culminating in Grimmett [1973], exploited the Mobius inversion formula to 
obtain simpler proofs in a more general context. The Martin boundary 
approach to infinite Gibbs fields is due to Follmer [1975a], and is based on 
the work of Dynkin [1971]. Their setting is far more general than the one 
presented here, so their arguments are not as elementary. The detailed 
study of countable Markov fields on Z was initiated by Spitzer [1975a]; much 
of Section 5 is based on his paper. Follmer [1975b] has also treated this 
subject. The proof of Proposition 12-32 was supplied by H. Kesten. 
Theorems 12-36 and 12-37 were obtained by Spitzer using tail fields rather 
than the Martin boundary approach. The ratio-limit representation of 
Theorem 12-37, which does not appear in Spitzer ’s paper, is cited by Cox 
[1976]. Example 12-38, due to Spitzer, makes use of Doob, Snell, and 
Williamson [1960R]. The important Theorem 12-40 was discovered by 
Dobruschin [1968]. Example 12-41 is a special case of a family of phase 
transition examples discussed by Cox [1976]. / The remarkable Theorem 
12-42 is due to Kesten [1976]; the reader is referred to his paper for the proof. 
Spitzer [1974] gives a very nice exposition of many aspects of random field 
theory not discussed here. Another useful reference is Dawson [1974]. 
Tree processes were first studied by Preston [1974], whose book contains 
a wealth of information on random fields. A more recent reference is 
Spitzer [1975b]. A lucid exposition of the Ising model may be found in 
Griffiths [1972]. Problems 7 and 8 are derived from Spitzer [1975a] ; Problem 
9 is based on a construction of S. Kalikow [1976]. 

Corrections to the first edition: 

Pitman [1974] pointed out that Theorem 9-53 was stated incorrectly in 
the first edition. It had stated incorrectly that g = —Gf, overlooking the 
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possible failure of the relation A(^Nf) = (A ^N)f. This theorem and its 
consequences have been corrected in the second edition. 

For general ergodic chains we can give no useful condition under which this 
associativity holds. However, for a strong ergodic chain associativity holds 
if / is bounded (and in particular if the potential g is bound^, since / = 
(/ — P)g). In fact, it is enough to observe that a < a < 

00 . Thus for a strong ergodic chain, g = —6r/ if the charge / is bounded. 
This conclusion was obtained in a more direct way in Proposition 9-73. 

Martingales: 

Two books that develop the subject of martingales are those by Meyer 
[1972] and Neveu [1972a]. Both books begin with the basic material on 
martingales. Neveu’s contains a short chapter on the optimal stopping 
problem that is discussed later in these notes. The latter part of each book 
begins to reflect the explosion in the subject of martingales that has centered 
around integral inequalities. 

Early developments were by Burkholder [1966] and Gundy [1967]. Sup- 
pose (fn, ^n) is a martingale and dj^ is the sequence of differences d^ = /q, 
dn = fn - fn-l U > I, SO that + ; • • + d^. If is measurable 

with respect to «^_i, then the sequence g^ with 

n 

k = 0 

is called a transform oi It is a martingale if iff[|^,i|] < oo for all n, by 
imitation of the proof of Proposition 3-7. (The case that the are charac- 
teristic functions arises in the proof of the Upcrossing Lemma and is the case 
of optional sampling.) Burkholder and Gundy deal with questions of 
convergence and integral boundedness of such transforms. The gambling 
interpretation is as foUows: A gambler playing a sequence of rounds in a fair 
game can win d^ dollars in round n, and his fortune is then/,i. If sup M[\fn\\ 
< 00, his fortune eonverges to a finite limit /„ a.e. and Af[|/oo |] < sup M[\fn\\ 
In the transformed game he is allowed to vary the stakes according to his 
past experience; at time n he can win dollars. How can he improve his 
circumstances by choosing suitably? Burkholder’s first theorem is that 
if sup,j \vn\ < 00 a.e., then g^ converges a.e. to a finite limit g„,; however, 
M[\gao\\ may be infinite. 

In studying a martingale {/„}, Burkholder and Gundy work with the 
function (2n = i |/n generalizations. This is called the S- 

function; some of its properties of convergence and average size are com- 
parable with those of {fn}- The prototype for such conclusions is the 
Khintchine-Kolmogorov Theorem : Let {y^ be independent random variables 
with ilf[ 2 /„] = 0 and M{\y„\^] = If 2n = i a? = < oo, then «/„ 

is convergent a.e. and in Conversely if E is convergent in then 
E a\ = < QQ and M[\E y^\^'] = There is a eorresponding martingale 

result with /,1 — place of y^. 

There is a parallel between the theory of martingales and some of the 
developments in Euclidean Fourier analysis, and some of the theorems in 
each of the areas motivate theorems in the other. Let Qq be the unit cube in 
n dimensional space and let be the partition of Qq consisting of cubes 
of side 2~^ with all coordinates of vertices at integral multiples of 2 “ Now 
let (/fc, ^*) be a martingale; for example, is a martingale if/ 
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is integrable on Qq, and there are other examples. Meanwhile, consider 
harmonic functions u in I e ^ > 0}; for example, the 

Poisson integral of an integrable function on Qq (or in all of R”) is an example, 
and there are other examples. The parallel is obtained by comparing 
properties of the function in the special martingale with the function 

• , 2“^), where u is harmonic. 

This parallel was handled rigorously in one case when the martingale 
theorem of Burkholder and Gundy [1970] was generalized by Burkholder, 
Gundy, and Silverstein [1971] to deal with Brownian motion and then to 
prove a Euclidean theorem. 

Except in this one case, however, the idea has been to make the comparison 
and to proceed by analogy. Under the parallel, the martingale /S- function 
corresponds to the “ Lusin area function ” of Fourier analysis if the martingale 
(/fc, .^*) is general, and to the Little wood -Paley gr-function” of Fourier 
analysis if the martingale has/;^ = ilif[/|.^*]. The parallel also gives useful 
information in dealing with functions of bounded mean oscillation. For an 
account of the martingale results, see Garsia [1973]. Fefferman [1975] has 
given a thorough exposition of the Euclidean results and described the 
parallel in more detail. 



Strong ratio limit property: 

A noncyclic recurrent chain P has the strong ratio limit property (SRLP) 
if there are positive numbers > 0 such that 



lim 

n-* 00 



P(n + m) 







for all i, j, k, I, m. Chung and Erdos [1951 R] showed the SRLP holds for 
sums of independent random variables on the integers, and an example 
reproduced in Chung [1960 R] shows the SRLP fails for a certain non-cyclic 
recurrent P. 

Orey [1961] proved the SRLP holds if PqV^V^’o tends to 1 and it holds if 



lim sup 

n-» 00 



p(m(n + l)) 

P<^-> 



< 1 . 



The latter condition holds for a reversible chain since Pg{J is non-increasing 
in n, and hence the SRLP holds for reversible chains. 

The result of Chung and Erdos can be interpreted as showing the SRLP 
holds if there is spatial homogeneity of the right kind. Kingman and Orey 
[1964 R] proved the SRLP under a much weaker assumption of spatial 
homogeneity — ^that > 1 + e for aU i for some c > 0 and some n. Sums 

of independent random variables clearly have this property. 

More recent work has concentrated on transient chains. The SRLP 
requires reformulation in these cases. Pruitt [1965] gives a definition and 
proves theorems analogous to those of Orey [1961]. The book by Orey 
[1971] treats these matters in some detail. See also Freedman [1971]. 



Applied uses of Markov chains : 

Probabilistic functions of finite Markov chains: Let P be a finite Markov 
chain with state space S and starting vector tt. Suppose, for each i in 8, 
that Fi. is a probability measure on a finite set F. We imagine a process in 
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which P takes place unseen in the background and the matrix F is used at 
each time to produce an outcome in F. Calling the outcomes we have 

(*) Pr[2/i = *1 A • • • A 3 /n = M 

For example, a subject in a psychological experiment may have different 
probabilities for making responses according to his state of mind. If we 
imagine his frame of mind as the outcome of a Markov chain, then S is the 
set of frames of mind and Y is the set of responses. 

If each row vector has all its mass in one entry, then the situation is 
that in lumping. The states in S are lumped in some fashion and the 
lumped states are those in Y. The lumped process need not be a Markov 
chain. Such processes have been studied extensively for a long time. See 
Rosenblatt [1971], Chapter III. 

The opposite extreme occurs when all Fij, P^, and are > 0. The typical 
practical problem that arises is to estimate the parameters tt, P, and P if a 
finite sequence of outcomes is all that is known. Specifically one wants 
values of tt, P, and F that make (*) a maximum, given ki, . . kj^, Baum 
et al. [1970] give an iterative procedure in the last paragraph of their paper for 
passing from one set of values of n, P, and F to another with the property 
that (*) increases to a critical point. Their theorem that (*) increases to a 
critical point has been used in modeling letter patterns in English words, in 
predicting sunspot behavior, and in anticipating the stock market. As 
indicated above, it also has applications to psychology and sociology. 

Optimal stopping problems: Let be a denumerable stochastic 

process, and let = xJ^yQ^y-^, . . . ^y^) be real-valued and measurable. 
Suppose the are integrable. The problem is to find 

V = sup M[x[\, 

t 

where the supremum is taken over all random times t. F is the value of the 
Xn process. If the supremum is attained for some ^ is an optimal strategy, 
and a further problem is to describe t. 

We are to regard the y^s as some observable outcomes and the x^’^ as 
rewards. We are allowed to choose the time of obtaining our reward, 
without clairvoyance, and the problem is to maximize the payoff. There is 
an extensive theory in this generality. See Chow, Robbins, and Siegmund 
[1971]. Neveu [1972a] treats the martingale case, beginning with “Le 
probleme de Snell,” solved in Snell [1952 R]. 

In many applications the {y^ are the outcome of a Markov chain, and the 
?^th reward function is x^J^yQ, . . .,y^) = f(yn), with / a function on the state 
space that is independent of n. The value is simply = sup^ Mi[f{yi)]. 
If / is bounded, then v is the least nonnegative superregular function >/. 
See Dynkin and Yushkevich [1969], Chapter 3, for a treatment of the problem 
and discussion of strategy. A deeper investigation is the book by Sirjaev 
[1973]. 

Recurrent potential theory: 

Orey [1964 R] developed a recurrent potential theory that avoids the 
notion of a normal chain and proceeds from axioms for a potential operator. 
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Neveu [1972b] developed recurrent potential theory from a different 
perspective, proceeding as follows. For a function h on S with 0 < h < 1 , 
define 



= 2 (PD,.^YP = 2 p(A_,pr, 



where is the diagonal matrix with \ as ith diagonal entry. 

When = 1 , P. When A = 0, = P + p2 + p3 + . . . . In 

the general case with / > 0, 



(UJ\ = M, 



2 (1 - A(*i))(l - h(x ^)).. . . .(1 - h(x^_,))f(x,) 



If we write for when h is the characteristic function of E, we have 



(UJ)^ = M, 



2 



Ll ^ 71 ^ 



When / is 1 on and 0 elsewhere, the right side reduces to the probability 
that the chain started in state i ever returns to the set E. 

Let P be recurrent with positive regular measure a. Neveu proves that 
there is some h on S with 0 < < 1 such that U^Ji = ^ and > 1a. 

Fix such an h, put F = — 1 a, and define 



W = 2 {VD^YV. 

71 = 0 



Clearly IF is >0. The finiteness of W is settled by the facts that Wh = 
and ai)^lF = ca, where c is the constant (1 — ah)l(ah). The operator I W 
is the potential kernel, and Neveu develops an appropriate theory for it. 

See also Revuz [1975]. 



Transient boundary theory: 

Transient boundary theory for general denumerable Markov chains stands 
about where it was in 1966. 

Dynkin [1969] gave an account of the theory that does not use extended 
chains. For the theory of the exit boundary the idea is to use a martingale - 
upcrossing argument to deal with a superregular measure fx. li f(i) = 
lxJ(7TN)i, the key result is that lim„_oo /(^n(^)) exists a.e. on infinite paths, 
provided / satisfies a suitable integrability condition. From this result, 
the result we call Theorem 10-18 follows without reference to any extended 
chains, and the rest of the theory requires no change. 

Athreya and Ney [1972] apply transient boundary theory to branching 
processes in Chapter II of their book. 

Sums of independent random variables : 

Sums of independent random variables for a countable group that is not 
necessarily abelian can be defined just as in the abelian case (see Chapter 4), 
as long as left and right are distinguished carefully. Let P be the transition 
matrix for such a chain, and suppose that the states form a single class. 
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Regarding the question of transience vs. recurrence, Kesten [1959] con- 
sidered a symmetric P as a linear operator on square-summable sequences. 
He proved that the group admits such a P with spectral radius one if and 
only if the group is amenable, i.e., there is a non-zero left-invariant positive 
linear functional on the bounded functions on the group. If the spectral 
radius is less than one, then not only does N have finite entries but also N 
is a bounded operator on square-summable sequences; hence recurrence 
implies spectral radius one. Day [1964] removed the hypothesis of symmetry 
in Kesten’s theorem and found further equivalent conditions. His theorem 
has been reproved several times by other authors. 

For the exact question of transience vs. recurrence the results are less 
decisive. In his book Spitzer [1976] settles the case of processes on the 
lattice points in Euclidean space. Dudley [1962] proved that a countable 
abelian group has a recurrent P (with one class of states) if and only if the 
maximum number of linearly independent elements is at most 2. In the 
non-abelian case Kesten [1967] conjectured that the existence of a recurrent 
P for a group is related to the growth in n of the number of elements of the 
group expressible as a product of n generators. Milnor [1968a] showed that 
this growth function is approximately independent of the set of generators; 
he pointed out that the existence of a symmetric P with spectral radius less 
than 1 implies exponential growth, and he gave an example of a solvable 
group and a symmetric P with spectral radius 1 and with exponential growth. 
Milnor [1968b] and Wolf [1968] considered classes of countable groups and 
gave conditions under which the growth function is of polynomial size or of 
exponential size. 

Ney and Spitzer [1966] compute the Martin boundary for transient sums 
of independent random variables on a lattice with nonzero mean. Kesten 
and Spitzer [1965] prove the existence of the potential kernel for recurrent 
sums of independent random variables on countable abelian groups, and they 
consider the Martin boundary. Kesten [1967] generalized this work to 
general countable groups, although the extent to which non-trivial non- 
abelian groups can admit recurrent processes is still not known. 

Derriennic [1975] deals with sums of independent random variables on a 
free group with n > \ generators. It is assumed that the transition matrix 
has only finitely many nonzero entries in each row and that all states com- 
municate. He shows that such a process is transient and that the boundary 
consists of all ' ‘ reduced infinite words . ’ ’ The special case in which the process 
in one step can move from a word only to the product of that word by a 
generator or its inverse was considered earlier by other authors. When in 
the special case all the 2?^ one-step probabilities are equally likely, the 
resulting example is one that arises in the theory of algebraic groups. 
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Cylinder set, 43, 326 

Degenerate, 296 
Denumerable, 1 
Deviation, 411 
Diagonal matrix, 1 
Difference equation, 35 
Dominated convergence, 29 
Double sequence space, 326 
Drunkard’s walk, 82, 114, 116 
Dual, 193 

Dual equilibrium set, 203, 205 
Duality, 136 

Electric circuit, 303 
Elementary continuous function, 407 
Energy, 171, 214, 303 
for random field, 428 
Enlarged chain, 114, 191 
Entrance boundary, 366, 402 
Entrance law, 454 
Entry, 1 

Equilibrium potential, 172, 203, 218, 
300 

Equilibrium set, 172, 203 
Equivalent Gibbs fields, 446 
Ergodic chain, 131, 262 
Ergodic degree, 273 
Ergodic set, 259 
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Escape probability, 109 
Exit boundary, 337, 413 
Extended chain, 329, 331 
Extended stochastic process, 326 
Extreme measure, 436 
Extreme point, 348, 367 

Field of sets, 10 
augmented, 12 
Fine boundary function, 362 
Fine topology, 363 
Finite 

boundary, 213 
drunkard’s walk, 82, 114 
energy, 214 
matrix, 1 
product, 192 
random field, 427 
First passage time, 52, 95 
mean of, 143, 146 
Function (colunrn vector), 23 
Fundamental matrix, 107 

Game of chance, 62, 70, 74 
Generalized charge, 299 
Generalized potential, 299 
Gibbs field 
finite, 429 

neighbor, infinite, 435 
Green’s function, 171 

/i-Process, 196 
Harmonic, 184 

Harmonic measure, 342, 346, 406 
Hitting probability, 95, 97, 109 
Homogeneous 
field, 446 
potential, 446 

Identically distributed, 61, 146 
Independent, 52, 61, 146 
Independent process, 52, 146 
Independent trials process, 52, 104, 
146, 165, 283 
Index sets, 1 

Infinite drunkard’s walk, 82, 116 
Integrable, 22 
uniformly, 30 
Integral, 22 

Ladder process, 125 



Land of Oz, 81, 164, 321 
Law of Large Numbers, 75 
Lebesgue measure, 56 
Limits, 6 

Local characteristics, 426 

Markov 

chain, 79 

chain property, 79 
field, 427 
process, 79 
property, 79 
property, strong, 88 
property, two-sided, 426 
Martin boundary, see Boundary 
Martingale, 61 
Matrix, 1 
addition, 2 
product, 3 
transition, 80 
Mean, 53 

conditional, 54, 58, 60, 77 
Mean Ergodic Theorem, 130 
Measurable function, 18, 52 
Measurable set, 12, 18 
Measurable statement, 48 
Measure, 11, 23 
Measure space, 1 1 
Measure, tree, 46 
Minimal, 348, 366 
Monotone convergence, 26 
Multinomial coefficient, 35 

Neighbor 

Gibbs field, 429, 435 
pair potential, 429 
potential, 428 
system, 427 
Noncyclic, 145 
Non-negative, 1, 10 
Normal chain, 252, 253 
Normalized, 347, 366 
potential, 428 
Null chain, 131 

Outcome, 43 

Partition, 54 

cross, 60 
function, 429 

Passage times, see First passage time 
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Path, 43 
Period, 144 
Phase 

multiplicity, 436 
transition, 436 
Polya’s um model, 77, 399 
Positive, 1 

Potential, 170, 183, 192, 236, 241, 299 
balayage, 172, 212, 260, 266, 302 
equilibrium, 172, 203, 218, 300 
for random field, 428 
generalized, 299 
operator, 171 
pure, 192, 236, 299 
theory, 171 

p-q Random walk, 84, 384 
Probability, 48 
conditional, 50 
space, 11 

Process watched in E, 133 
Process watched starting in E, 328 

Random field, 425 
Random time, 53 
Random variables, 52 

identically distributed, 61, 146 
independent, 61, 146 
Random walk, 84 

(see also Drunkard’s walk, p-q 
Random walk, Refiecting random 
walk. Sums of independent ran- 
dom variables. Symmetric random 
walk) 

Ratio Limit Theorem, 243 

Recurrent chain, 102 

Recurrent state, 99 

Reference measure, 436 

Refiecting random walk, 104, 158, 284 

Regular, 86, 184 

Renewal process, 426 

Represent, 304 

Repulsive Markov field, 457 

Reverse, 334 

Reverse chain, 136, 162 

Reversible chain, 308 

Row, 2, 4 

Sequence space, 43, 326 
Set function, 10 
Sigma-finite, 13 
Simple function, 21 



Small set, 258 
Space 

probability, 11 
sequence, 43, 326 
state, 42, 46, 326 
Space-time coin tossing, 394 
Space-time Markov chain, 233 
Space-time process, 105 
Square matrix, 1 
Standard voltage problem, 304 
Starting distribution, 80 
State 

absorbing, 81 
recurrent, 99 
space, 42, 46, 326 
transient, 99 
Stochastic process, 46 
extended, 326 
Stopping time, 69 
Strong ergodic chain, 274 
Strong Markov property, 88 
Submartingale, 61 
Subregular, 86 
Summable, Abel, 36 
Summable, Cesaro, 35 
Sums of independent random variables, 
62, 73, 83, 119, 122, 125, 146, 416 
on the line, 62, 74, 121, 128, 240, 285, 
416 

Superharmonic, 185 
Supermartingale, 61 
Superregular, 86, 185 
Support, 172, 197, 241, 299 
Symmetric difference, 14 
Symmetric random walk, 84, 185, 187, 
315, 386 

Systems theorems, 63, 69, 93 

Tail-field, 89 
T-continuous, 408 
Thermodynamic limit, 439 
Thin set, 365 
Time, 43 
random, 53 
stopping, 69 

Total charge, 171, 202, 241 
Total measure, 11 
Transient chain, 102 
Transient state, 99 
Transition matrix. 80 
Transpose, 2 
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Tree measure, 46 
Tree process, 85, 104 
(random field), 456 
Truth set, 41 

Uniformly integrable, 30 
Upcrossing, 65 



Um model, 77, 399 
Vector, 1, 2 
Weak charge, 256 
Zero-one law, 119 




